Overview

IRAP trial-type D scores are calculated from an average of only 18 pairs of reaction times. This would be deemed as far too low anywhere else in the literature on reaction time based tasks. The implications of this can be seen in how poorly estimated any one IRAP D score is. We can observe this by bootstrapping reaction times for each participant’s D scores and noting how wide their confidence intervals are.

Note on confidence vs credibility intervals

“Credibility Interval” is used in a number of different ways in the literature, sometimes denoting the difference between frequentist vs Bayesian approaches to probability, and other times denoting whether the interval takes into account not only the variance of the fixed effect in question but also the random effects. Here I employ the latter sense of the word.

The metafor R package easily produces credibility intervals but typically employed effect sizes as its input, whereas here I employ lme4 to fit mutilevel/metaanalytic models using the data itself. lme4 and helper packages built on top of it do not provide a method to produce credibility intervals in the sense that metafor does. As such, I implemented this myself using the metafor package’s code. The key conceptual link between the two is that lme4 models return the SD of the random effect, whereas metafor refers to this as Tau and return it as its squared value. These values refer to the same property: “In random-effects meta-analysis, the extent of variation among the effects observed in different studies (between-study variance) is referred to as tau-squared, τ2, or Tau2 (Deeks et al 2008). τ2 is the variance of the effect size parameters across the population of studies and it reflects the variance of the true effect sizes. The square root of this number is referred to as tau (T). T2 and Tau reflect the amount of true heterogeneity. T2 represents the absolute value of the true variance (heterogeneity). T2 is the variance of the true effects while tau (T) is the estimated standard deviation of underlying true effects across studies (Deeks et al 2008). The summary meta-analysis effect and T as standard deviation may be reported in random-effects meta-analysis to describe the distribution of true effects (Borenstein et al 2009).” (Source).

# dependencies
library(tidyverse)
library(knitr)
library(kableExtra)
library(boot)
library(parallel)
library(bayestestR)
library(patchwork)
library(mdthemes)
library(lme4)
library(sjPlot)
library(emmeans)
library(ggstance)
# library(merTools) called via merTools:: to avoid namespace collisions between MASS and dplyr


# set seed for reproducibility
set.seed(42)

# number of bootstraps
n_boots <- 2000 

# options
options(knitr.table.format = "html") # necessary configuration of tables

# disable scientific notation
options(scipen = 999) 


# function to round all numeric vars in a data frame
round_df <- function(df, n_digits = 3) {
  df %>% mutate_if(is.numeric, round, digits = n_digits)
}


# get data from evaluative IRAPs
data_trial_level <- read_csv("../data/data_trial_level.csv") %>%
  filter(timepoint == "baseline")

# outliers
data_outliers <- data_trial_level %>%
  distinct(unique_id, .keep_all = TRUE) %>%
  select(unique_id, domain, mean_rt) %>%
  mutate(median_mean_rt = median(mean_rt, na.rm = TRUE),
         mad_mean_rt = mad(mean_rt, na.rm = TRUE)) %>%
  # exclude median +- 2MAD
  mutate(rt_outlier = ifelse(mean_rt < median_mean_rt-mad_mean_rt*2 |
                               mean_rt > median_mean_rt+mad_mean_rt*2, TRUE, FALSE)) %>%
  filter(rt_outlier == FALSE) %>%
  select(unique_id, rt_outlier) %>%
  full_join(data_trial_level, by = "unique_id") %>%
  mutate(rt_outlier = ifelse(is.na(rt_outlier), TRUE, rt_outlier))

data_outliers_removed <- data_outliers %>%
  filter(rt_outlier == FALSE)

# trim RTs>10000 ms, as part of D scoring
data_trimmed <- data_outliers_removed %>%
  select(unique_id, domain, trial_type, rt, block_type) %>%
  filter(rt <= 10000)

# for plot
dodge_width <- 0.25

Descriptives

data_outliers %>%
  distinct(unique_id, .keep_all = TRUE) %>%
  count(rt_outlier) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
rt_outlier n
FALSE 1464
TRUE 110
data_descriptives <- data_outliers_removed %>%
  distinct(unique_id, .keep_all = TRUE)

data_descriptives %>%
  count(domain) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
domain n
Body image 23
Clinton-Trump 97
Countries (1) 53
Countries (2) 50
Death (1) 17
Death (2) 20
Death (3) 26
Disgust (1) 37
Disgust (2) 43
Friend-Enemy 98
Gender (1) 41
Gender (2) 95
Lincoln-Hitler 131
Personality - Agreeableness 39
Personality - Conscientiousness 39
Personality - Extraversion 33
Personality - Neuroticism 32
Personality - Openness 33
Race (1) 45
Race (2) 44
Religion 30
Rich-Poor 84
Sexuality (1) 26
Sexuality (2) 19
Shapes & colors (1) 11
Shapes & colors (2) 11
Shapes & colors (3) 10
Shapes & colors (4) 8
Shapes & colors (5) 22
Shapes & colors (6) 26
Shapes & colors (7) 14
Stigma - ADHD 62
Stigma - PTSD 54
Stigma - Schizophrenia 54
Valenced words 37
data_descriptives %>%
  count(domain) %>%
  summarize(total_n = sum(n),
            min_n_per_domain = min(n),
            max_n_per_domain = max(n),
            mean_n_per_domain = round(mean(n, na.rm = TRUE), 2),
            sd_n_per_domain = round(sd(n, na.rm = TRUE), 2)) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
total_n min_n_per_domain max_n_per_domain mean_n_per_domain sd_n_per_domain
1464 8 131 41.83 28.73
data_descriptives %>%
  summarize(min_age = round(min(age, na.rm = TRUE), 2),
            max_age = round(max(age, na.rm = TRUE), 2),
            mean_age = round(mean(age, na.rm = TRUE), 2),
            sd_age = round(sd(age, na.rm = TRUE), 2)) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
min_age max_age mean_age sd_age
16 60 20.1 4.39
data_descriptives %>%
  count(gender) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
gender n
Female 221
Male 134
Other 1
NA 1108

Distribution of RTs

ggplot(data_trimmed, aes(rt, fill = block_type)) +
  geom_density(alpha = 0.3) +
  facet_wrap(~ trial_type)

data_trimmed %>%
  group_by(trial_type, block_type) %>%
  summarize(mean = mean(rt, na.rm = TRUE),
            sd = sd(rt, na.rm = TRUE)) %>%
  round_df(0) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
trial_type block_type mean sd
tt1 con 1395 567
tt1 incon 1560 637
tt2 con 1570 658
tt2 incon 1633 687
tt3 con 1560 639
tt3 incon 1520 670
tt4 con 1598 662
tt4 incon 1634 695

Bootstrap 95% CIs on D scores

Calculate scores

D scores calculated by trial-type

# bootstrapping has a long execution time, so load saved values if they've already been calculated
if(file.exists("models/data_estimates_D.rds")) {
  
  data_estimates_D <- read_rds("models/data_estimates_D.rds")
  
} else {
  
  D_score <- function(data, i) {
    data_with_indexes <- data[i,] # boot function requires data and index
    mean_con <- mean(data_with_indexes$rt[data_with_indexes$block_type == "con"], na.rm = TRUE)
    mean_incon <- mean(data_with_indexes$rt[data_with_indexes$block_type == "incon"], na.rm = TRUE)
    sd <- sd(data_with_indexes$rt, na.rm = TRUE)
    D <- (mean_incon - mean_con) / sd
    return(D)
  }
  
  bootstrap_D_score <- function(data){
    
    require(dplyr)
    require(boot)
    
    fit <- 
      boot::boot(data      = data, 
                 statistic = D_score, 
                 R         = n_boots, 
                 sim       = "ordinary", 
                 stype     = "i",
                 parallel  = "multicore", 
                 ncpus     = parallel::detectCores())
    
    results <- boot::boot.ci(fit, conf = 0.95, type = c("norm","basic", "perc", "bca"))
    
    output <- 
      tibble(method   = c("normal", "basic", "percent", "bca"),
             estimate = rep(fit$t0, 4),
             ci_lower = c(results$normal[2], results$basic[4], results$percent[4], results$bca[4]),
             ci_upper = c(results$normal[3], results$basic[5], results$percent[5], results$bca[5]))
    
    return(output)
  }
  
  # bootstrap D scores 
  data_estimates_D <- data_trimmed %>%
    select(unique_id, domain, trial_type, rt, block_type) %>%
    group_by(unique_id, domain, trial_type) %>%
    do(bootstrap_D_score(data = .)) %>%
    ungroup() %>%
    mutate(sig = ifelse((ci_lower < 0 & ci_upper < 0) | (ci_lower > 0 & ci_upper > 0), TRUE, FALSE),
           ci_width = ci_upper - ci_lower) %>%
    round_df(3)
  
  # save to disk
  write_rds(data_estimates_D, "models/data_estimates_D.rds", compress = "gz")
  
}

PI scores calculated by trial-type

# bootstrapping has a long execution time, so load saved values if they've already been calculated
if(file.exists("models/data_estimates_PI.rds")) {
  
  data_estimates_PI <- read_rds("models/data_estimates_PI.rds")
  
} else {
  
  # Fast calculation of the A statistic - code from Ruscio (2008) supplementary materials
  PI_score <- function(data, i) {
    data_with_indexes <- data[i,] # boot function requires data and index
    x  <- na.omit(data_with_indexes$rt[data_with_indexes$block_type == "incon"])
    y  <- na.omit(data_with_indexes$rt[data_with_indexes$block_type == "con"])
    nx <- length(x)
    ny <- length(y)
    rx <- sum(rank(c(x, y))[1:nx])
    PI <- (rx / nx - (nx + 1) / 2) / ny
    return(PI)
  }
  
  bootstrap_PI_score <- function(data){
    
    require(dplyr)
    require(boot)
    
    fit <- 
      boot::boot(data      = data, 
                 statistic = PI_score, 
                 R         = n_boots, 
                 sim       = "ordinary", 
                 stype     = "i",
                 parallel  = "multicore", 
                 ncpus     = parallel::detectCores())
    
    results <- boot::boot.ci(fit, conf = 0.95, type = c("norm","basic", "perc", "bca"))
    
    output <- 
      tibble(method   = c("normal", "basic", "percent", "bca"),
             estimate = rep(fit$t0, 4),
             ci_lower = c(results$normal[2], results$basic[4], results$percent[4], results$bca[4]),
             ci_upper = c(results$normal[3], results$basic[5], results$percent[5], results$bca[5]))
    
    return(output)
  }
  
  # bootstrap PI scores 
  data_estimates_PI <- data_outliers_removed %>%
    group_by(unique_id, domain, trial_type) %>%
    do(bootstrap_PI_score(data = .)) %>%
    ungroup() %>%
    mutate(sig = ifelse((ci_lower < 0.50 & ci_upper < 0.50) | (ci_lower > 0.50 & ci_upper > 0.50), TRUE, FALSE),
           ci_width = ci_upper - ci_lower) %>%
    round_df(3)
  
  # save to disk
  write_rds(data_estimates_PI, "models/data_estimates_PI.rds", compress = "gz")
  
}

Plots

By bootstrapping method

data_estimates_D %>%
  arrange(estimate) %>%
  mutate(method = fct_relevel(method, "bca", "basic", "normal", "percentile")) %>%
  group_by(method) %>%
  mutate(ordered_id = row_number()/n()) %>%
  ungroup() %>%
  ggplot() +
  geom_linerange(aes(x = ordered_id, ymin = ci_lower, ymax = ci_upper, color = sig),
                 alpha = 1) +
  geom_point(aes(ordered_id, estimate), size = 0.5) +
  geom_hline(yintercept = 0, linetype = "dotted") +
  mdthemes::md_theme_linedraw() +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        legend.position = "top") +
  scale_color_viridis_d(end = 0.6, direction = -1) +
  xlab("Participant (ranked by *D* score)") +
  ylab("*D* score") +
  labs(color = "95% CI excludes zero point") + 
  facet_wrap(~ method, ncol = 5)

By domain

p_cis_by_domain <- 
  data_estimates_D %>%
  filter(method == "bca") %>%
  mutate(domain = str_replace(domain, "Personality - ", "Big5: "),
         domain = str_replace(domain, "Stigma - ", "Stigma: ")) %>%
  arrange(estimate) %>%
  group_by(domain) %>%
  mutate(ordered_id = row_number()/n()) %>%
  ungroup() %>%
  ggplot() +
  geom_linerange(aes(x = ordered_id, ymin = ci_lower, ymax = ci_upper, color = sig),
                 alpha = 1) +
  geom_point(aes(ordered_id, estimate), size = 0.5, shape = "square") +
  geom_hline(yintercept = 0, linetype = "dotted") +
  mdthemes::md_theme_linedraw() +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        legend.position = "top") +
  scale_color_viridis_d(end = 0.6, direction = -1) +
  xlab("Ranked participant") +
  ylab("*D* score") +
  labs(color = "95% CI excludes zero point") + 
  facet_wrap(~ domain, ncol = 5)

p_cis_by_domain

CI widths

Widths cant be directly compared between D and PI as they have different ranges, so D scores only.

Not meta analyzed as extreme skew in data means that residuals are very non-normal, violating assumptions and underestimating MAP estimates. Instead I simply present MAP estimates for each trial type & method, and then domain, trial type, and method.

Descriptives

full_join(data_estimates_D %>%
            mutate(method = fct_relevel(method, "bca", "basic", "normal", "percent")) %>%
            group_by(method) %>%
            do(point_estimate(.$ci_width, centrality = "MAP")),
          data_estimates_D %>%
            mutate(method = fct_relevel(method, "bca", "basic", "normal", "percent")) %>%
            group_by(method) %>%
            summarize(min = min(ci_width),
                      max = max(ci_width),
                      .groups = "drop"),
          by = "method") %>%
  round_df(2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
method MAP min max
bca 1.31 0.34 1.97
basic 1.31 0.39 1.75
normal 1.32 0.37 2.15
percent 1.31 0.39 1.75
data_estimates_D %>%
  mutate(method = fct_relevel(method, "bca", "basic", "normal", "percent")) %>%
  group_by(trial_type, method) %>%
  do(point_estimate(.$ci_width, centrality = "MAP")) %>%
  round_df(2) %>%
  pivot_wider(names_from = method, values_from = MAP) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
trial_type bca basic normal percent
tt1 1.31 1.31 1.32 1.31
tt2 1.31 1.31 1.32 1.31
tt3 1.31 1.31 1.32 1.31
tt4 1.31 1.31 1.32 1.31
data_estimates_D %>%
  mutate(method = fct_relevel(method, "bca", "basic", "normal", "percent")) %>%
  summarize(min_Dscore = min(estimate),
            max_Dscore = max(estimate)) %>%
  mutate(range_Dscore = max_Dscore - min_Dscore) %>% 
  round_df(2) 
## # A tibble: 1 x 3
##   min_Dscore max_Dscore range_Dscore
##        <dbl>      <dbl>        <dbl>
## 1      -1.27       1.53          2.8

Plot by domain

data_ci_width_map_D <- data_estimates_D %>%
  group_by(domain, trial_type, method) %>%
  #summarize(median = median(ci_width), .groups = "drop") %>%
  do(point_estimate(.$ci_width, centrality = "MAP")) %>%
  ungroup() %>%
  mutate(method = fct_relevel(method, "bca", "basic", "normal", "percent"),
         method = fct_rev(method),
         trial_type = case_when(trial_type == "tt1" ~ "Trial type 1",
                                trial_type == "tt2" ~ "Trial type 2",
                                trial_type == "tt3" ~ "Trial type 3",
                                trial_type == "tt4" ~ "Trial type 4"),
         trial_type = fct_relevel(trial_type, "Trial type 1", "Trial type 2", "Trial type 3", "Trial type 4")) %>%
  mutate(domain = fct_reorder(domain, MAP, .fun = min)) %>%
  arrange(domain)

# save to disk
write_rds(data_ci_width_map_D, "models/data_ci_width_map_D.rds")

# plot
p_ci_widths <- 
  ggplot(data_ci_width_map_D, aes(MAP, domain, color = method, shape = method)) + 
  geom_line() +
  geom_point() +
  scale_shape_manual(labels = c("BCA", "Basic", "Normal", "Percentile"), values = c(15, 16, 17, 7)) +
  scale_color_viridis_d(begin = 0.2, end = 0.8, labels = c("BCA", "Basic", "Normal", "Percentile")) +
  mdthemes::md_theme_linedraw() +
  facet_wrap(~ trial_type, ncol = 4, nrow = 1) +
  labs(x = "MAP 95% CI width",
       y = "",
       color = "Bootstrap method",
       shape = "Bootstrap method") + 
  theme(legend.position = "top")

p_ci_widths

Proportion different from zero

Calculate scores

data_diff_zero <- 
  bind_rows(mutate(data_estimates_D, DV_type = "*D* scores"),
            mutate(data_estimates_PI, DV_type = "PI scores")) %>%
  mutate(domain = as.factor(domain),
         method = fct_relevel(method, "bca", "basic", "normal", "percent"),
         method = fct_rev(method),
         trial_type = case_when(trial_type == "tt1" ~ "Trial type 1",
                                trial_type == "tt2" ~ "Trial type 2",
                                trial_type == "tt3" ~ "Trial type 3",
                                trial_type == "tt4" ~ "Trial type 4"),
         trial_type = fct_relevel(trial_type, "Trial type 1", "Trial type 2", "Trial type 3", "Trial type 4")) %>%
  group_by(domain, trial_type, method, DV_type) %>%
  summarize(proportion_diff_zero = mean(sig),
            variance = plotrix::std.error(sig)^2,
            .groups = "drop") %>%
  mutate(variance = ifelse(variance == 0, 0.000001, variance))  # model cannot be run on zero variance, so offset by a miniscule amount

# save to disk
write_rds(data_diff_zero, "models/data_diff_zero.rds")

Plots

Compare scoring methods

data_diff_zero %>%
  filter(method == "bca") %>%
  mutate(DV_type = case_when(DV_type == "*D* scores" ~ "D",
                             DV_type == "PI scores" ~ "PI")) %>%
  select(-variance) %>%
  pivot_wider(names_from = c(trial_type, DV_type, method),
              values_from = proportion_diff_zero) %>%
  round_df(2) %>%
  janitor::clean_names() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
domain trial_type_1_d_bca trial_type_1_pi_bca trial_type_2_d_bca trial_type_2_pi_bca trial_type_3_d_bca trial_type_3_pi_bca trial_type_4_d_bca trial_type_4_pi_bca
Body image 0.17 0.17 0.09 0.09 0.04 0.13 0.22 0.13
Clinton-Trump 0.20 0.25 0.08 0.07 0.18 0.18 0.08 0.05
Countries (1) 0.21 0.34 0.15 0.19 0.13 0.08 0.00 0.02
Countries (2) 0.08 0.14 0.12 0.20 0.08 0.12 0.06 0.06
Death (1) 0.06 0.06 0.12 0.12 0.29 0.18 0.06 0.06
Death (2) 0.10 0.10 0.10 0.10 0.10 0.15 0.10 0.20
Death (3) 0.38 0.31 0.15 0.19 0.12 0.08 0.15 0.19
Disgust (1) 0.19 0.24 0.11 0.11 0.30 0.24 0.32 0.30
Disgust (2) 0.23 0.26 0.21 0.23 0.12 0.07 0.33 0.35
Friend-Enemy 0.31 0.36 0.18 0.18 0.04 0.09 0.08 0.11
Gender (1) 0.07 0.10 0.00 0.05 0.10 0.07 0.17 0.20
Gender (2) 0.19 0.17 0.03 0.03 0.17 0.18 0.06 0.03
Lincoln-Hitler 0.29 0.32 0.11 0.11 0.08 0.11 0.11 0.11
Personality - Agreeableness 0.31 0.31 0.13 0.21 0.05 0.13 0.15 0.15
Personality - Conscientiousness 0.38 0.41 0.03 0.00 0.10 0.10 0.10 0.21
Personality - Extraversion 0.15 0.15 0.09 0.06 0.21 0.21 0.03 0.00
Personality - Neuroticism 0.25 0.28 0.06 0.03 0.22 0.16 0.16 0.16
Personality - Openness 0.12 0.09 0.12 0.06 0.06 0.03 0.15 0.12
Race (1) 0.16 0.18 0.02 0.02 0.16 0.16 0.09 0.09
Race (2) 0.36 0.43 0.07 0.11 0.14 0.20 0.11 0.09
Religion 0.17 0.17 0.10 0.07 0.17 0.20 0.13 0.07
Rich-Poor 0.18 0.25 0.11 0.08 0.12 0.19 0.07 0.11
Sexuality (1) 0.42 0.50 0.04 0.15 0.27 0.35 0.15 0.12
Sexuality (2) 0.58 0.58 0.37 0.37 0.42 0.47 0.26 0.37
Shapes & colors (1) 0.18 0.27 0.18 0.18 0.00 0.09 0.00 0.00
Shapes & colors (2) 0.18 0.36 0.09 0.09 0.00 0.00 0.09 0.18
Shapes & colors (3) 0.10 0.10 0.10 0.10 0.30 0.30 0.20 0.30
Shapes & colors (4) 0.25 0.38 0.00 0.00 0.00 0.12 0.25 0.12
Shapes & colors (5) 0.36 0.50 0.00 0.05 0.00 0.09 0.00 0.09
Shapes & colors (6) 0.23 0.23 0.12 0.19 0.08 0.08 0.27 0.31
Shapes & colors (7) 0.57 0.50 0.14 0.21 0.29 0.29 0.36 0.36
Stigma - ADHD 0.16 0.27 0.08 0.08 0.13 0.16 0.11 0.18
Stigma - PTSD 0.30 0.31 0.11 0.07 0.09 0.20 0.11 0.09
Stigma - Schizophrenia 0.13 0.22 0.11 0.11 0.22 0.30 0.15 0.13
Valenced words 0.41 0.49 0.14 0.11 0.08 0.14 0.08 0.11
p_diff_zero_scoring <- 
  data_diff_zero %>%
  filter(method == "bca") %>%
  mutate(domain = fct_reorder(domain, proportion_diff_zero, .fun = mean)) %>%
  ggplot(aes(proportion_diff_zero, domain, color = DV_type, shape = DV_type)) +
  geom_linerangeh(aes(xmin = proportion_diff_zero - sqrt(variance)*1.96,
                      xmax = proportion_diff_zero + sqrt(variance)*1.96),
                  position = position_dodge(width = 1)) + 
  geom_point(position = position_dodge(width = 1)) +
  scale_shape_manual(labels = c("*D* scores", "PI scores"), values = c(15, 16)) +
  scale_color_viridis_d(begin = 0.3, end = 0.7, labels = c("*D* scores", "PI scores")) +
  mdthemes::md_theme_linedraw() +
  facet_wrap(~ trial_type, ncol = 4) +
  labs(x = "Proportion of scores different from zero point",
       y = "",
       color = "Scoring method",
       shape = "Scoring method") + 
  theme(legend.position = "top")

p_diff_zero_scoring

Compare CI bootstrapping methods

p_diff_zero_method <- 
  data_diff_zero %>%
  filter(DV_type == "*D* scores") %>%
  mutate(domain = fct_reorder(domain, proportion_diff_zero, .fun = mean)) %>%
  ggplot(aes(proportion_diff_zero, domain, color = method, shape = method)) +
  geom_linerangeh(aes(xmin = proportion_diff_zero - sqrt(variance)*1.96,
                      xmax = proportion_diff_zero + sqrt(variance)*1.96),
                  position = position_dodge(width = 1)) + 
  geom_point(position = position_dodge(width = 1)) +
  scale_shape_manual(labels = c("BCA", "Basic", "Normal", "Percentile"), values = c(15, 16, 17, 7)) +
  scale_color_viridis_d(begin = 0.2, end = 0.8, labels = c("BCA", "Basic", "Normal", "Percentile")) +
  mdthemes::md_theme_linedraw() +
  facet_wrap(~ trial_type, ncol = 4) +
  labs(x = "Proportion of scores different from zero point",
       y = "",
       color = "Bootstrapping method",
       shape = "Bootstrapping method") + 
  theme(legend.position = "top")

p_diff_zero_method

Model

Compare scoring methods

# fit model
fit_diff_zero_scoring <- 
  data_diff_zero %>%
  filter(method == "bca") %>%
  lmer(proportion_diff_zero ~ 1 + trial_type * DV_type + (trial_type | domain),
       weights = 1/variance, 
       data = .)

# results table
tab_model(fit_diff_zero_scoring, 
          string.p = "p (corrected)", 
          ci.hyphen = ", ",
          #emph.p = FALSE,
          p.adjust = "holm")
  proportion diff zero
Predictors Estimates CI p (corrected)
(Intercept) 0.24 0.19, 0.28 <0.001
trial_type [Trial type 2] -0.14 -0.18, -0.09 <0.001
trial_type [Trial type 3] -0.10 -0.15, -0.05 <0.001
trial_type [Trial type 4] -0.11 -0.15, -0.06 <0.001
DV_type PI scores 0.04 0.02, 0.06 0.001
trial_type [Trial type 2]
* DV_type PI scores
-0.04 -0.06, -0.02 0.001
trial_type [Trial type 3]
* DV_type PI scores
-0.04 -0.06, -0.02 0.001
trial_type [Trial type 4]
* DV_type PI scores
-0.04 -0.06, -0.02 0.001
Random Effects
σ2 0.42
τ00 domain 0.01
τ11 domain.trial_typeTrial type 2 0.01
τ11 domain.trial_typeTrial type 3 0.02
τ11 domain.trial_typeTrial type 4 0.02
ρ01 -0.85
-0.76
-0.76
ICC 0.02
N domain 35
Observations 280
Marginal R2 / Conditional R2 0.008 / 0.029
# summary
summary(fit_diff_zero_scoring)
## Linear mixed model fit by REML ['lmerMod']
## Formula: proportion_diff_zero ~ 1 + trial_type * DV_type + (trial_type |  
##     domain)
##    Data: .
## Weights: 1/variance
## 
## REML criterion at convergence: -669.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.45472 -0.29090  0.08845  0.66833  2.23314 
## 
## Random effects:
##  Groups   Name                   Variance Std.Dev. Corr             
##  domain   (Intercept)            0.01478  0.1216                    
##           trial_typeTrial type 2 0.01408  0.1187   -0.85            
##           trial_typeTrial type 3 0.01864  0.1365   -0.76  0.74      
##           trial_typeTrial type 4 0.01698  0.1303   -0.76  0.78  0.73
##  Residual                        0.42016  0.6482                    
## Number of obs: 280, groups:  domain, 35
## 
## Fixed effects:
##                                          Estimate Std. Error t value
## (Intercept)                              0.235973   0.021956  10.747
## trial_typeTrial type 2                  -0.136759   0.021897  -6.246
## trial_typeTrial type 3                  -0.101478   0.024771  -4.097
## trial_typeTrial type 4                  -0.107244   0.023846  -4.497
## DV_typePI scores                         0.036650   0.009736   3.764
## trial_typeTrial type 2:DV_typePI scores -0.036574   0.009778  -3.740
## trial_typeTrial type 3:DV_typePI scores -0.036282   0.009779  -3.710
## trial_typeTrial type 4:DV_typePI scores -0.036576   0.009778  -3.740
## 
## Correlation of Fixed Effects:
##             (Intr) tr_Tt2 tr_Tt3 tr_Tt4 DV_PIs t_Tt2s t_Tt3s
## trl_typTrt2 -0.854                                          
## trl_typTrt3 -0.770  0.743                                   
## trl_typTrt4 -0.770  0.775  0.731                            
## DV_typPIscr -0.210  0.211  0.186  0.194                     
## t_Tt2:DV_Ps  0.210 -0.212 -0.186 -0.193 -0.996              
## t_Tt3:DV_Ps  0.210 -0.210 -0.187 -0.193 -0.996  0.991       
## t_Tt4:DV_Ps  0.210 -0.210 -0.186 -0.195 -0.996  0.991  0.991
## optimizer (nloptwrap) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.00279646 (tol = 0.002, component 1)
# plot RE effects
plot_model(fit_diff_zero_scoring, 
           colors = "bw",
           type = "re") +
  mdthemes::md_theme_linedraw() +
  ggtitle("")

# extract re Tau
results_re_tau_diff_zero_scoring <- fit_diff_zero_scoring %>%
  merTools::REsdExtract() %>%
  as_tibble(rownames = "trial_type") %>%
  rename(tau = value) %>%
  mutate(trial_type = case_when(trial_type == "domain.(Intercept)" ~ "Trial type 1",
                                trial_type == "domain.trial_typeTrial type 2" ~ "Trial type 2", 
                                trial_type == "domain.trial_typeTrial type 3" ~ "Trial type 3", 
                                trial_type == "domain.trial_typeTrial type 4" ~ "Trial type 4"))

# extract marginal means
results_emm_diff_zero_scoring <- 
  summary(emmeans(fit_diff_zero_scoring, ~ DV_type | trial_type)) %>%
  dplyr::select(DV_type, trial_type, estimate = emmean, se = SE, ci_lower = lower.CL, ci_upper = upper.CL) %>%
  mutate(trial_type = fct_relevel(trial_type, "Trial type 1", "Trial type 2", "Trial type 3", "Trial type 4"))

# combine
results_diff_zero_scoring <- results_emm_diff_zero_scoring %>%
  full_join(results_re_tau_diff_zero_scoring, by = "trial_type") %>%
  mutate(cr_lower = estimate - (1.96 * sqrt(se^2 + tau^2)),  # as in metafor package's implementation of credibility intervals, see metafor::predict.rma.R 
         cr_upper = estimate + (1.96 * sqrt(se^2 + tau^2)))

# save to disk
write_rds(results_diff_zero_scoring, "models/results_diff_zero_scoring.rds")

# tests
emmeans(fit_diff_zero_scoring, list(pairwise ~ DV_type | trial_type), adjust = "bonferroni")
## $`emmeans of DV_type | trial_type`
## trial_type = Trial type 1:
##  DV_type    emmean     SE    df lower.CL upper.CL
##  *D* scores 0.2360 0.0222  6625   0.1863    0.286
##  PI scores  0.2726 0.0223  6628   0.2227    0.323
## 
## trial_type = Trial type 2:
##  DV_type    emmean     SE    df lower.CL upper.CL
##  *D* scores 0.0992 0.0119 25313   0.0725    0.126
##  PI scores  0.0993 0.0119 25318   0.0726    0.126
## 
## trial_type = Trial type 3:
##  DV_type    emmean     SE    df lower.CL upper.CL
##  *D* scores 0.1345 0.0162 13980   0.0982    0.171
##  PI scores  0.1349 0.0162 13982   0.0986    0.171
## 
## trial_type = Trial type 4:
##  DV_type    emmean     SE    df lower.CL upper.CL
##  *D* scores 0.1287 0.0157 13415   0.0935    0.164
##  PI scores  0.1288 0.0157 13416   0.0935    0.164
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 2 estimates 
## 
## $`pairwise differences of DV_type | trial_type`
## trial_type = Trial type 1:
##  contrast                 estimate       SE           df t.ratio p.value
##  *D* scores - PI scores -0.0366504 0.009737      8979469 -3.764  0.0002 
## 
## trial_type = Trial type 2:
##  contrast                 estimate       SE           df t.ratio p.value
##  *D* scores - PI scores -0.0000768 0.000906 119717151791 -0.085  0.9325 
## 
## trial_type = Trial type 3:
##  contrast                 estimate       SE           df t.ratio p.value
##  *D* scores - PI scores -0.0003681 0.000910 117400475158 -0.404  0.6859 
## 
## trial_type = Trial type 4:
##  contrast                 estimate       SE           df t.ratio p.value
##  *D* scores - PI scores -0.0000744 0.000906 119551896105 -0.082  0.9346 
## 
## Degrees-of-freedom method: kenward-roger
emmeans(fit_diff_zero_scoring, list(pairwise ~ trial_type | DV_type), adjust = "bonferroni")
## $`emmeans of trial_type | DV_type`
## DV_type = *D* scores:
##  trial_type   emmean     SE    df lower.CL upper.CL
##  Trial type 1 0.2360 0.0222  6625   0.1806    0.291
##  Trial type 2 0.0992 0.0119 25313   0.0695    0.129
##  Trial type 3 0.1345 0.0162 13980   0.0941    0.175
##  Trial type 4 0.1287 0.0157 13415   0.0894    0.168
## 
## DV_type = PI scores:
##  trial_type   emmean     SE    df lower.CL upper.CL
##  Trial type 1 0.2726 0.0223  6628   0.2170    0.328
##  Trial type 2 0.0993 0.0119 25318   0.0695    0.129
##  Trial type 3 0.1349 0.0162 13982   0.0945    0.175
##  Trial type 4 0.1288 0.0157 13416   0.0895    0.168
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 4 estimates 
## 
## $`pairwise differences of trial_type | DV_type`
## DV_type = *D* scores:
##  contrast                    estimate     SE    df t.ratio p.value
##  Trial type 1 - Trial type 2  0.13676 0.0220 13560  6.203  <.0001 
##  Trial type 1 - Trial type 3  0.10148 0.0249 13817  4.069  0.0003 
##  Trial type 1 - Trial type 4  0.10724 0.0240 15081  4.468  <.0001 
##  Trial type 2 - Trial type 3 -0.03528 0.0170 46982 -2.074  0.2283 
##  Trial type 2 - Trial type 4 -0.02951 0.0155 61762 -1.904  0.3416 
##  Trial type 3 - Trial type 4  0.00577 0.0179 50978  0.322  1.0000 
## 
## DV_type = PI scores:
##  contrast                    estimate     SE    df t.ratio p.value
##  Trial type 1 - Trial type 2  0.17333 0.0222 13508  7.822  <.0001 
##  Trial type 1 - Trial type 3  0.13776 0.0250 13768  5.501  <.0001 
##  Trial type 1 - Trial type 4  0.14382 0.0241 15020  5.966  <.0001 
##  Trial type 2 - Trial type 3 -0.03557 0.0170 46980 -2.091  0.2192 
##  Trial type 2 - Trial type 4 -0.02951 0.0155 61768 -1.903  0.3419 
##  Trial type 3 - Trial type 4  0.00606 0.0179 50977  0.338  1.0000 
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: bonferroni method for 6 tests
# plot
dodge_width <- 0.8

p_prop_nonzero_scoring <- 
  ggplot(results_diff_zero_scoring, aes(fct_rev(trial_type), estimate, color = DV_type, shape = DV_type, group = DV_type)) +
  geom_linerange(aes(ymin = cr_lower, ymax = cr_upper), position = position_dodge(width = dodge_width), linetype = "dotted") +
  geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), size = 1, position = position_dodge(width = dodge_width)) +
  # geom_line(position = position_dodge(width = dodge_width)) +
  geom_point(position = position_dodge(width = dodge_width), size = 2.5) +
  mdthemes::md_theme_linedraw() +
  scale_y_continuous(breaks = c(0, .25, .5, .75, 1), labels = c("0.00<br/>(Worse)", "0.25", "0.50", "0.75", "1.00<br/>(Better)")) +
  scale_color_viridis_d(alpha = 1, begin = 0.3, end = 0.7, labels = c("*D* scores", "PI scores")) +
  labs(shape = "Scoring method",
       color = "Scoring method",
       x = "",
       y = "Proportion of participants with non-zero scores<br/>") + 
  theme(legend.position = "top") +
  coord_flip(ylim = c(0, 1))

p_prop_nonzero_scoring

Compare CI bootstrapping methods

# fit model
fit_diff_zero_method <- 
  data_diff_zero %>%
  filter(DV_type == "*D* scores") %>%
  lmer(proportion_diff_zero ~ 1 + trial_type * method + (trial_type | domain),
       weights = 1/variance, 
       data = .)

# results table
tab_model(fit_diff_zero_method, 
          string.p = "p (corrected)", 
          ci.hyphen = ", ",
          #emph.p = FALSE,
          p.adjust = "holm")
  proportion diff zero
Predictors Estimates CI p (corrected)
(Intercept) 0.31 0.26, 0.35 <0.001
trial_type [Trial type 2] -0.19 -0.24, -0.15 <0.001
trial_type [Trial type 3] -0.16 -0.22, -0.10 <0.001
trial_type [Trial type 4] -0.13 -0.18, -0.08 <0.001
method [normal] 0.03 0.01, 0.05 0.007
method [basic] 0.06 0.05, 0.08 <0.001
method [bca] -0.07 -0.08, -0.05 <0.001
trial_type [Trial type 2]
* method [normal]
0.03 0.00, 0.05 0.068
trial_type [Trial type 3]
* method [normal]
0.04 0.01, 0.06 0.010
trial_type [Trial type 4]
* method [normal]
-0.01 -0.03, 0.02 0.996
trial_type [Trial type 2]
* method [basic]
0.03 0.01, 0.06 0.024
trial_type [Trial type 3]
* method [basic]
0.04 0.02, 0.06 0.009
trial_type [Trial type 4]
* method [basic]
-0.01 -0.03, 0.02 0.996
trial_type [Trial type 2]
* method [bca]
0.07 0.05, 0.08 <0.001
trial_type [Trial type 3]
* method [bca]
0.07 0.05, 0.08 <0.001
trial_type [Trial type 4]
* method [bca]
0.02 -0.00, 0.04 0.151
Random Effects
σ2 0.33
τ00 domain 0.02
τ11 domain.trial_typeTrial type 2 0.02
τ11 domain.trial_typeTrial type 3 0.03
τ11 domain.trial_typeTrial type 4 0.02
ρ01 -0.84
-0.78
-0.71
ICC 0.03
N domain 35
Observations 560
Marginal R2 / Conditional R2 0.016 / 0.048
# summary
summary(fit_diff_zero_method)
## Linear mixed model fit by REML ['lmerMod']
## Formula: proportion_diff_zero ~ 1 + trial_type * method + (trial_type |  
##     domain)
##    Data: .
## Weights: 1/variance
## 
## REML criterion at convergence: -1517.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2582 -0.3883  0.1004  0.5687  3.1523 
## 
## Random effects:
##  Groups   Name                   Variance Std.Dev. Corr             
##  domain   (Intercept)            0.01746  0.1321                    
##           trial_typeTrial type 2 0.01680  0.1296   -0.84            
##           trial_typeTrial type 3 0.02650  0.1628   -0.78  0.80      
##           trial_typeTrial type 4 0.02095  0.1447   -0.71  0.71  0.73
##  Residual                        0.33022  0.5746                    
## Number of obs: 560, groups:  domain, 35
## 
## Fixed effects:
##                                      Estimate Std. Error t value
## (Intercept)                          0.308929   0.023434  13.183
## trial_typeTrial type 2              -0.192111   0.023318  -8.239
## trial_typeTrial type 3              -0.160350   0.028678  -5.591
## trial_typeTrial type 4              -0.127595   0.026081  -4.892
## methodnormal                         0.031708   0.009494   3.340
## methodbasic                          0.064431   0.009627   6.693
## methodbca                           -0.067421   0.008871  -7.600
## trial_typeTrial type 2:methodnormal  0.027250   0.011405   2.389
## trial_typeTrial type 3:methodnormal  0.036709   0.011693   3.140
## trial_typeTrial type 4:methodnormal -0.008180   0.012075  -0.677
## trial_typeTrial type 2:methodbasic   0.033191   0.011771   2.820
## trial_typeTrial type 3:methodbasic   0.038777   0.012038   3.221
## trial_typeTrial type 4:methodbasic  -0.005323   0.012405  -0.429
## trial_typeTrial type 2:methodbca     0.066640   0.008908   7.481
## trial_typeTrial type 3:methodbca     0.066750   0.008908   7.493
## trial_typeTrial type 4:methodbca     0.021179   0.010817   1.958
# plot re
plot_model(fit_diff_zero_method, 
           colors = "bw",
           type = "re") +
  mdthemes::md_theme_linedraw() +
  ggtitle("")

# extract re Tau
results_re_tau_diff_zero_method <- fit_diff_zero_method %>%
  merTools::REsdExtract() %>%
  as_tibble(rownames = "trial_type") %>%
  rename(tau = value) %>%
  mutate(trial_type = case_when(trial_type == "domain.(Intercept)" ~ "Trial type 1",
                                trial_type == "domain.trial_typeTrial type 2" ~ "Trial type 2", 
                                trial_type == "domain.trial_typeTrial type 3" ~ "Trial type 3", 
                                trial_type == "domain.trial_typeTrial type 4" ~ "Trial type 4"))

# extract marginal means
results_emm_diff_zero_method <- 
  summary(emmeans(fit_diff_zero_method, ~ method | trial_type)) %>%
  dplyr::select(method, trial_type, estimate = emmean, se = SE, ci_lower = lower.CL, ci_upper = upper.CL) %>%
  mutate(trial_type = fct_relevel(trial_type, "Trial type 1", "Trial type 2", "Trial type 3", "Trial type 4"))

# combine
results_diff_zero_method <- results_emm_diff_zero_method %>%
  full_join(results_re_tau_diff_zero_method, by = "trial_type") %>%
  mutate(cr_lower = estimate - (1.96 * sqrt(se^2 + tau^2)),  # as in metafor package's implementation of credibility intervals, see metafor::predict.rma.R 
         cr_upper = estimate + (1.96 * sqrt(se^2 + tau^2)))

# save to disk
write_rds(results_diff_zero_method, "models/results_diff_zero_method.rds")

# tests
emmeans(fit_diff_zero_method, list(pairwise ~ method | trial_type), adjust = "bonferroni")
## $`emmeans of method | trial_type`
## trial_type = Trial type 1:
##  method  emmean     SE   df lower.CL upper.CL
##  percent  0.309 0.0240 1080   0.2490    0.369
##  normal   0.341 0.0240 1086   0.2806    0.401
##  basic    0.373 0.0241 1091   0.3132    0.434
##  bca      0.242 0.0238 1063   0.1820    0.301
## 
## trial_type = Trial type 2:
##  method  emmean     SE   df lower.CL upper.CL
##  percent  0.117 0.0132 3220   0.0838    0.150
##  normal   0.176 0.0140 3885   0.1408    0.211
##  basic    0.214 0.0142 4040   0.1790    0.250
##  bca      0.116 0.0132 3215   0.0830    0.149
## 
## trial_type = Trial type 3:
##  method  emmean     SE   df lower.CL upper.CL
##  percent  0.149 0.0179 1946   0.1039    0.193
##  normal   0.217 0.0186 2180   0.1706    0.263
##  basic    0.252 0.0187 2220   0.2050    0.299
##  bca      0.148 0.0179 1945   0.1032    0.193
## 
## trial_type = Trial type 4:
##  method  emmean     SE   df lower.CL upper.CL
##  percent  0.181 0.0191 1673   0.1335    0.229
##  normal   0.205 0.0193 1704   0.1567    0.253
##  basic    0.240 0.0194 1735   0.1920    0.289
##  bca      0.135 0.0189 1619   0.0879    0.182
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 4 estimates 
## 
## $`pairwise differences of method | trial_type`
## trial_type = Trial type 1:
##  contrast          estimate       SE           df t.ratio p.value
##  percent - normal -0.031708 0.009494     18390245  -3.340 0.0050 
##  percent - basic  -0.064431 0.009627     17350004  -6.693 <.0001 
##  percent - bca     0.067421 0.008872     23853769   7.600 <.0001 
##  normal - basic   -0.032723 0.009766     16426027  -3.351 0.0048 
##  normal - bca      0.099129 0.009032     22034114  10.975 <.0001 
##  basic - bca       0.131852 0.009176     20512833  14.369 <.0001 
## 
## trial_type = Trial type 2:
##  contrast          estimate       SE           df t.ratio p.value
##  percent - normal -0.058958 0.006320     52184590  -9.328 <.0001 
##  percent - basic  -0.097622 0.006774     38249409 -14.411 <.0001 
##  percent - bca     0.000781 0.000806 330766001965   0.969 1.0000 
##  normal - basic   -0.038664 0.008135     33864472  -4.753 <.0001 
##  normal - bca      0.059738 0.006308     52119682   9.471 <.0001 
##  basic - bca       0.098403 0.006763     38176259  14.551 <.0001 
## 
## trial_type = Trial type 3:
##  contrast          estimate       SE           df t.ratio p.value
##  percent - normal -0.068417 0.006826     38328005 -10.023 <.0001 
##  percent - basic  -0.103208 0.007228     29755644 -14.279 <.0001 
##  percent - bca     0.000671 0.000807 335130816874   0.831 1.0000 
##  normal - basic   -0.034791 0.008684     26144671  -4.006 0.0004 
##  normal - bca      0.069088 0.006816     38292899  10.136 <.0001 
##  basic - bca       0.103879 0.007219     29717292  14.390 <.0001 
## 
## trial_type = Trial type 4:
##  contrast          estimate       SE           df t.ratio p.value
##  percent - normal -0.023528 0.007462     47629729  -3.153 0.0097 
##  percent - basic  -0.059108 0.007823     38385761  -7.555 <.0001 
##  percent - bca     0.046242 0.006189     94703959   7.472 <.0001 
##  normal - basic   -0.035580 0.008115     34205153  -4.384 0.0001 
##  normal - bca      0.069770 0.006756     62785077  10.326 <.0001 
##  basic - bca       0.105350 0.007200     45936913  14.631 <.0001 
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: bonferroni method for 6 tests
emmeans(fit_diff_zero_method, list(pairwise ~ trial_type | method), adjust = "bonferroni")
## $`emmeans of trial_type | method`
## method = percent:
##  trial_type   emmean     SE   df lower.CL upper.CL
##  Trial type 1  0.309 0.0240 1080   0.2490    0.369
##  Trial type 2  0.117 0.0132 3220   0.0838    0.150
##  Trial type 3  0.149 0.0179 1946   0.1039    0.193
##  Trial type 4  0.181 0.0191 1673   0.1335    0.229
## 
## method = normal:
##  trial_type   emmean     SE   df lower.CL upper.CL
##  Trial type 1  0.341 0.0240 1086   0.2806    0.401
##  Trial type 2  0.176 0.0140 3885   0.1408    0.211
##  Trial type 3  0.217 0.0186 2180   0.1706    0.263
##  Trial type 4  0.205 0.0193 1704   0.1567    0.253
## 
## method = basic:
##  trial_type   emmean     SE   df lower.CL upper.CL
##  Trial type 1  0.373 0.0241 1091   0.3132    0.434
##  Trial type 2  0.214 0.0142 4040   0.1790    0.250
##  Trial type 3  0.252 0.0187 2220   0.2050    0.299
##  Trial type 4  0.240 0.0194 1735   0.1920    0.289
## 
## method = bca:
##  trial_type   emmean     SE   df lower.CL upper.CL
##  Trial type 1  0.242 0.0238 1063   0.1820    0.301
##  Trial type 2  0.116 0.0132 3215   0.0830    0.149
##  Trial type 3  0.148 0.0179 1945   0.1032    0.193
##  Trial type 4  0.135 0.0189 1619   0.0879    0.182
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 4 estimates 
## 
## $`pairwise differences of trial_type | method`
## method = percent:
##  contrast                    estimate     SE   df t.ratio p.value
##  Trial type 1 - Trial type 2   0.1921 0.0237 2123  8.109  <.0001 
##  Trial type 1 - Trial type 3   0.1604 0.0292 1674  5.493  <.0001 
##  Trial type 1 - Trial type 4   0.1276 0.0265 2380  4.819  <.0001 
##  Trial type 2 - Trial type 3  -0.0318 0.0175 6443 -1.814  0.4187 
##  Trial type 2 - Trial type 4  -0.0645 0.0191 6029 -3.371  0.0045 
##  Trial type 3 - Trial type 4  -0.0328 0.0208 5439 -1.578  0.6882 
## 
## method = normal:
##  contrast                    estimate     SE   df t.ratio p.value
##  Trial type 1 - Trial type 2   0.1649 0.0242 2284  6.819  <.0001 
##  Trial type 1 - Trial type 3   0.1236 0.0297 1756  4.170  0.0002 
##  Trial type 1 - Trial type 4   0.1358 0.0266 2413  5.102  <.0001 
##  Trial type 2 - Trial type 3  -0.0412 0.0188 7947 -2.197  0.1683 
##  Trial type 2 - Trial type 4  -0.0291 0.0198 6724 -1.469  0.8512 
##  Trial type 3 - Trial type 4   0.0121 0.0215 5975  0.566  1.0000 
## 
## method = basic:
##  contrast                    estimate     SE   df t.ratio p.value
##  Trial type 1 - Trial type 2   0.1589 0.0243 2331  6.529  <.0001 
##  Trial type 1 - Trial type 3   0.1216 0.0298 1774  4.082  0.0003 
##  Trial type 1 - Trial type 4   0.1329 0.0267 2445  4.969  <.0001 
##  Trial type 2 - Trial type 3  -0.0373 0.0190 8242 -1.961  0.2996 
##  Trial type 2 - Trial type 4  -0.0260 0.0201 6951 -1.296  1.0000 
##  Trial type 3 - Trial type 4   0.0113 0.0217 6124  0.523  1.0000 
## 
## method = bca:
##  contrast                    estimate     SE   df t.ratio p.value
##  Trial type 1 - Trial type 2   0.1255 0.0235 2094  5.337  <.0001 
##  Trial type 1 - Trial type 3   0.0936 0.0290 1660  3.223  0.0078 
##  Trial type 1 - Trial type 4   0.1064 0.0261 2312  4.073  0.0003 
##  Trial type 2 - Trial type 3  -0.0319 0.0175 6435 -1.821  0.4121 
##  Trial type 2 - Trial type 4  -0.0191 0.0189 5901 -1.010  1.0000 
##  Trial type 3 - Trial type 4   0.0128 0.0205 5314  0.625  1.0000 
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: bonferroni method for 6 tests
# plot
p_prop_nonzero_method <- 
  ggplot(results_diff_zero_method, aes(fct_rev(trial_type), estimate, color = method, shape = method, group = method)) +
  geom_linerange(aes(ymin = cr_lower, ymax = cr_upper), position = position_dodge(width = dodge_width), linetype = "dotted") +
  geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), size = 1, position = position_dodge(width = dodge_width)) +
  # geom_line(position = position_dodge(width = dodge_width)) +
  geom_point(position = position_dodge(width = dodge_width), size = 2.5) +
  mdthemes::md_theme_linedraw() +
  scale_y_continuous(breaks = c(0, .25, .5, .75, 1), labels = c("0.00<br/>(Worse)", "0.25", "0.50", "0.75", "1.00<br/>(Better)")) +
  scale_shape_discrete(labels = c("BCA", "Basic", "Normal", "Percentile")) +
  scale_color_viridis_d(alpha = 1, begin = 0.3, end = 0.7, labels = c("BCA", "Basic", "Normal", "Percentile")) +
  labs(shape = "Bootstrapping method",
       color = "Bootstrapping method",
       x = "",
       y = "Proportion of participants with non-zero scores") + 
  theme(legend.position = "top") +
  coord_flip(ylim = c(0, 1))

p_prop_nonzero_method

Proportion different from one another

Within domain and trial type

Calculate discriminability

Many have argued that the zero point is arbitrary and not a useful reference point. Instead of asking “what proportion of D/PI scores are different from zero?”, we could also ask “what proportion of D/PI scores are different from one another?”

# helper function to apply workflow to each resample
discriminability <- function(data, i) {
  
  data_with_indexes <- data[i,] # boot function requires data and index
  
  estimate <- data_with_indexes$estimate
  ci_lower <- data_with_indexes$ci_lower
  ci_upper <- data_with_indexes$ci_upper
  
  n_estimate <- length(estimate)
  n_ci_lower <- length(ci_lower)
  n_ci_upper <- length(ci_upper)
  
  r_estimate <- sum(rank(c(estimate, ci_lower))[1:n_estimate])
  r_ci_upper <- sum(rank(c(ci_upper, estimate))[1:n_ci_upper])
  
  prob_estimate_inferior_to_ci_lower <- 1 - (r_estimate / n_estimate - (n_estimate + 1) / 2) / n_ci_lower
  prob_estimate_superior_to_ci_upper <- 1 - (r_ci_upper / n_ci_upper - (n_ci_upper + 1) / 2) / n_estimate
  
  probability_estimates_outside_cis <- (prob_estimate_inferior_to_ci_lower + prob_estimate_superior_to_ci_upper)
  
  return(probability_estimates_outside_cis)
  
}

bootstrap_discriminability <- function(data){
  
  require(dplyr)
  require(boot)
  
  fit <- 
    boot::boot(data      = data, 
               statistic = discriminability, 
               R         = n_boots,
               sim       = "ordinary", 
               stype     = "i",
               parallel  = "multicore", 
               ncpus     = parallel::detectCores())
  
  results <- boot::boot.ci(fit, conf = 0.95, type = c("norm","basic", "perc", "bca"))
  
  output <- 
    tibble(method   = c("normal", "basic", "percent", "bca"),
           estimate = rep(fit$t0, 4),
           ci_lower = c(results$normal[2], results$basic[4], results$percent[4], results$bca[4]),
           ci_upper = c(results$normal[3], results$basic[5], results$percent[5], results$bca[5]))
  
  return(output)
}

D scores

# bootstrapping has a long execution time, so load saved values if they've already been calculated
if(file.exists("models/data_discriminability_D.rds")) {
  
  data_discriminability_D <- read_rds("models/data_discriminability_D.rds")
  
} else {
  
  # bootstrap D scores 
  data_discriminability_D <- data_estimates_D %>%
    select(unique_id, domain, trial_type, estimate, ci_upper, ci_lower) %>%
    group_by(domain, trial_type) %>%
    do(bootstrap_discriminability(data = .)) %>%
    ungroup() %>%
    mutate(proportion_discriminable = estimate,
           variance = ((ci_upper - ci_lower)/(1.96*2))^2) %>%
    mutate(domain = as.factor(domain),
           method = fct_relevel(method, "bca", "basic", "normal", "percent"),
           trial_type = fct_relevel(trial_type, "tt1", "tt2", "tt3", "tt4"),
           DV_type = "*D* scores")
  
  # save to disk
  write_rds(data_discriminability_D, "models/data_discriminability_D.rds", compress = "gz")
  
}

PI scores

# bootstrapping has a long execution time, so load saved values if they've already been calculated
if(file.exists("models/data_discriminability_PI.rds")) {
  
  data_discriminability_PI <- read_rds("models/data_discriminability_PI.rds")
  
} else {
  
  # bootstrap D scores 
  data_discriminability_PI <- data_estimates_PI %>%
    select(unique_id, domain, trial_type, estimate, ci_upper, ci_lower) %>%
    group_by(domain, trial_type) %>%
    do(bootstrap_discriminability(data = .)) %>%
    ungroup() %>%
    mutate(proportion_discriminable = estimate,
           variance = ((ci_upper - ci_lower)/(1.96*2))^2) %>%
    mutate(domain = as.factor(domain),
           method = fct_relevel(method, "bca", "basic", "normal", "percent"),
           trial_type = fct_relevel(trial_type, "tt1", "tt2", "tt3", "tt4"),
           DV_type = "PI scores")
  
  # save to disk
  write_rds(data_discriminability_PI, "models/data_discriminability_PI.rds", compress = "gz")
  
}

Model

Compare scoring methods

# combine
data_discriminability_combined_scoring <- 
  bind_rows(data_discriminability_D,
            data_discriminability_PI) %>%
  filter(method == "bca") %>%
  mutate(trial_type = case_when(trial_type == "tt1" ~ "Trial type 1",
                                trial_type == "tt2" ~ "Trial type 2",   
                                trial_type == "tt3" ~ "Trial type 3",   
                                trial_type == "tt4" ~ "Trial type 4"))

# fit meta analytic model
fit_disciminability_scoring <- 
  lmer(proportion_discriminable ~ 1 + trial_type * DV_type + (trial_type | domain), 
       weights = 1/variance, 
       data = data_discriminability_combined_scoring)

# results table
tab_model(fit_disciminability_scoring, 
          string.p = "p (corrected)", 
          ci.hyphen = ", ",
          #emph.p = FALSE,
          p.adjust = "holm")
  proportion discriminable
Predictors Estimates CI p (corrected)
(Intercept) 0.26 0.23, 0.29 <0.001
trial_type [Trial type 2] -0.02 -0.05, 0.01 1.000
trial_type [Trial type 3] 0.01 -0.03, 0.04 1.000
trial_type [Trial type 4] -0.00 -0.03, 0.03 1.000
DV_type PI scores -0.00 -0.01, 0.01 1.000
trial_type [Trial type 2]
* DV_type PI scores
-0.01 -0.02, 0.01 1.000
trial_type [Trial type 3]
* DV_type PI scores
-0.01 -0.02, 0.01 1.000
trial_type [Trial type 4]
* DV_type PI scores
-0.00 -0.02, 0.01 1.000
Random Effects
σ2 0.94
τ00 domain 0.01
τ11 domain.trial_typeTrial type 2 0.01
τ11 domain.trial_typeTrial type 3 0.01
τ11 domain.trial_typeTrial type 4 0.01
ρ01 -0.62
-0.42
-0.65
ICC 0.01
N domain 35
Observations 280
Marginal R2 / Conditional R2 0.000 / 0.007
# plot RE effects
plot_model(fit_disciminability_scoring, 
           colors = "bw",
           type = "re") +
  mdthemes::md_theme_linedraw() +
  ggtitle("")

# extract re Tau
results_re_tau_disciminability_scoring <- fit_disciminability_scoring %>%
  merTools::REsdExtract() %>%
  as_tibble(rownames = "trial_type") %>%
  rename(tau = value) %>%
  mutate(trial_type = case_when(trial_type == "domain.(Intercept)" ~ "Trial type 1",
                                trial_type == "domain.trial_typeTrial type 2" ~ "Trial type 2", 
                                trial_type == "domain.trial_typeTrial type 3" ~ "Trial type 3", 
                                trial_type == "domain.trial_typeTrial type 4" ~ "Trial type 4"))

# extract marginal means
results_emm_disciminability_scoring <- 
  summary(emmeans(fit_disciminability_scoring, ~ DV_type | trial_type)) %>%
  dplyr::select(DV_type, trial_type, estimate = emmean, se = SE, ci_lower = lower.CL, ci_upper = upper.CL) %>%
  mutate(trial_type = fct_relevel(trial_type, "Trial type 1", "Trial type 2", "Trial type 3", "Trial type 4"))

# combine
results_disciminability_scoring <- results_emm_disciminability_scoring %>%
  full_join(results_re_tau_disciminability_scoring, by = "trial_type") %>%
  mutate(cr_lower = estimate - (1.96 * sqrt(se^2 + tau^2)),  # as in metafor package's implementation of credibility intervals, see metafor::predict.rma.R 
         cr_upper = estimate + (1.96 * sqrt(se^2 + tau^2)))

# save to disk
write_rds(results_disciminability_scoring, "models/results_disciminability_scoring.rds")

# tests
emmeans(fit_disciminability_scoring, list(pairwise ~ DV_type | trial_type), adjust = "bonferroni")
## $`emmeans of DV_type | trial_type`
## trial_type = Trial type 1:
##  DV_type    emmean     SE     df lower.CL upper.CL
##  *D* scores  0.257 0.0147  84755    0.224    0.290
##  PI scores   0.257 0.0148  85049    0.224    0.290
## 
## trial_type = Trial type 2:
##  DV_type    emmean     SE     df lower.CL upper.CL
##  *D* scores  0.236 0.0132  72585    0.207    0.266
##  PI scores   0.230 0.0132  72895    0.200    0.259
## 
## trial_type = Trial type 3:
##  DV_type    emmean     SE     df lower.CL upper.CL
##  *D* scores  0.267 0.0174  41481    0.227    0.306
##  PI scores   0.261 0.0175  41607    0.222    0.300
## 
## trial_type = Trial type 4:
##  DV_type    emmean     SE     df lower.CL upper.CL
##  *D* scores  0.257 0.0129 104739    0.228    0.286
##  PI scores   0.254 0.0129 104745    0.225    0.283
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 2 estimates 
## 
## $`pairwise differences of DV_type | trial_type`
## trial_type = Trial type 1:
##  contrast               estimate      SE        df t.ratio p.value
##  *D* scores - PI scores 0.000148 0.00606 302334465 0.024   0.9805 
## 
## trial_type = Trial type 2:
##  contrast               estimate      SE        df t.ratio p.value
##  *D* scores - PI scores 0.006683 0.00579 361738352 1.154   0.2486 
## 
## trial_type = Trial type 3:
##  contrast               estimate      SE        df t.ratio p.value
##  *D* scores - PI scores 0.005552 0.00591 334620188 0.940   0.3472 
## 
## trial_type = Trial type 4:
##  contrast               estimate      SE        df t.ratio p.value
##  *D* scores - PI scores 0.003001 0.00611 293120849 0.492   0.6230 
## 
## Degrees-of-freedom method: kenward-roger
emmeans(fit_disciminability_scoring, list(pairwise ~ DV_type), adjust = "bonferroni")
## $`emmeans of DV_type`
##  DV_type    emmean     SE    df lower.CL upper.CL
##  *D* scores  0.254 0.0109 30357    0.230    0.279
##  PI scores   0.250 0.0109 30347    0.226    0.275
## 
## Results are averaged over the levels of: trial_type 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 2 estimates 
## 
## $`pairwise differences of DV_type`
##  contrast               estimate      SE        df t.ratio p.value
##  *D* scores - PI scores  0.00385 0.00298 320272315 1.289   0.1973 
## 
## Results are averaged over the levels of: trial_type 
## Degrees-of-freedom method: kenward-roger
emmeans(fit_disciminability_scoring, list(pairwise ~ trial_type | DV_type), adjust = "bonferroni")
## $`emmeans of trial_type | DV_type`
## DV_type = *D* scores:
##  trial_type   emmean     SE     df lower.CL upper.CL
##  Trial type 1  0.257 0.0147  84755    0.220    0.294
##  Trial type 2  0.236 0.0132  72585    0.203    0.269
##  Trial type 3  0.267 0.0174  41481    0.223    0.310
##  Trial type 4  0.257 0.0129 104739    0.224    0.289
## 
## DV_type = PI scores:
##  trial_type   emmean     SE     df lower.CL upper.CL
##  Trial type 1  0.257 0.0148  85049    0.220    0.294
##  Trial type 2  0.230 0.0132  72895    0.197    0.263
##  Trial type 3  0.261 0.0175  41607    0.217    0.305
##  Trial type 4  0.254 0.0129 104745    0.221    0.286
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 4 estimates 
## 
## $`pairwise differences of trial_type | DV_type`
## DV_type = *D* scores:
##  contrast                     estimate     SE     df t.ratio p.value
##  Trial type 1 - Trial type 2  0.020896 0.0159 459177  1.312  1.0000 
##  Trial type 1 - Trial type 3 -0.009305 0.0180 219859 -0.516  1.0000 
##  Trial type 1 - Trial type 4  0.000598 0.0159 504342  0.038  1.0000 
##  Trial type 2 - Trial type 3 -0.030201 0.0145 347025 -2.081  0.2246 
##  Trial type 2 - Trial type 4 -0.020298 0.0148 682089 -1.375  1.0000 
##  Trial type 3 - Trial type 4  0.009903 0.0164 218378  0.602  1.0000 
## 
## DV_type = PI scores:
##  contrast                     estimate     SE     df t.ratio p.value
##  Trial type 1 - Trial type 2  0.027430 0.0160 458847  1.715  0.5184 
##  Trial type 1 - Trial type 3 -0.003901 0.0181 220162 -0.215  1.0000 
##  Trial type 1 - Trial type 4  0.003450 0.0159 502959  0.217  1.0000 
##  Trial type 2 - Trial type 3 -0.031331 0.0146 347919 -2.146  0.1913 
##  Trial type 2 - Trial type 4 -0.023980 0.0148 681744 -1.621  0.6298 
##  Trial type 3 - Trial type 4  0.007352 0.0165 218012  0.446  1.0000 
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: bonferroni method for 6 tests
# plot
p_prop_discriminable_scoring <- 
  ggplot(results_disciminability_scoring, aes(fct_rev(trial_type), estimate, color = DV_type, shape = DV_type, group = DV_type)) +
  geom_linerange(aes(ymin = cr_lower, ymax = cr_upper), position = position_dodge(width = dodge_width), linetype = "dotted") +
  geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), size = 1, position = position_dodge(width = dodge_width)) +
  # geom_line(position = position_dodge(width = dodge_width)) +
  geom_point(position = position_dodge(width = dodge_width), size = 2.5) +
  scale_y_continuous(breaks = c(0, .25, .5, .75, 1), labels = c("0.00<br/>(Worse)", "0.25", "0.50", "0.75", "1.00<br/>(Better)")) +
  scale_shape_discrete() +
  scale_color_viridis_d(begin = 0.3, end = 0.7) +
  mdthemes::md_theme_linedraw() +
  scale_x_discrete(labels = c("tt1" = "Trial type 1", "tt2" = "Trial type 2", "tt3" = "Trial type 3", "tt4" = "Trial type 4")) +
  labs(x = "",
       y = "Proportion of participants who whose scores<br/>differ from one another in pairwise comparisons<br/>",
       color = "Scoring method",
       shape = "Scoring method") + 
  theme(legend.position = "top") +
  coord_flip(ylim = c(0, 1))

p_prop_discriminable_scoring

Compare CI bootstrapping methods

# fit meta analytic model
fit_disciminability_method <- 
  lmer(proportion_discriminable ~ 1 + trial_type * method + (method | domain),  # non convergence with trial_type as random slope, so had to remove
       weights = 1/variance, 
       data = data_discriminability_D)

# results table
tab_model(fit_disciminability_method, 
          string.p = "p (corrected)", 
          ci.hyphen = ", ",
          #emph.p = FALSE,
          p.adjust = "holm")
  proportion discriminable
Predictors Estimates CI p (corrected)
(Intercept) 0.26 0.23, 0.28 <0.001
trial_type [tt2] -0.03 -0.05, -0.00 0.286
trial_type [tt3] 0.00 -0.02, 0.03 1.000
trial_type [tt4] -0.01 -0.03, 0.01 1.000
method [basic] -0.00 -0.02, 0.02 1.000
method [normal] -0.00 -0.02, 0.02 1.000
method [percent] -0.00 -0.02, 0.02 1.000
trial_type [tt2] * method
[basic]
-0.00 -0.03, 0.03 1.000
trial_type [tt3] * method
[basic]
-0.00 -0.03, 0.03 1.000
trial_type [tt4] * method
[basic]
-0.00 -0.03, 0.03 1.000
trial_type [tt2] * method
[normal]
-0.00 -0.03, 0.03 1.000
trial_type [tt3] * method
[normal]
-0.00 -0.03, 0.03 1.000
trial_type [tt4] * method
[normal]
-0.00 -0.03, 0.03 1.000
trial_type [tt2] * method
[percent]
-0.00 -0.03, 0.03 1.000
trial_type [tt3] * method
[percent]
-0.00 -0.03, 0.03 1.000
trial_type [tt4] * method
[percent]
-0.00 -0.03, 0.03 1.000
Random Effects
σ2 3.70
τ00 domain 0.00
τ11 domain.methodbasic 0.00
τ11 domain.methodnormal 0.00
τ11 domain.methodpercent 0.00
ρ01 1.00
0.98
1.00
N domain 35
Observations 560
Marginal R2 / Conditional R2 0.000 / NA
# summary
summary(fit_disciminability_method)
## Linear mixed model fit by REML ['lmerMod']
## Formula: proportion_discriminable ~ 1 + trial_type * method + (method |  
##     domain)
##    Data: data_discriminability_D
## Weights: 1/variance
## 
## REML criterion at convergence: -1463.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.83874 -0.57509  0.05272  0.71104  2.34296 
## 
## Random effects:
##  Groups   Name          Variance      Std.Dev.  Corr          
##  domain   (Intercept)   0.00493695932 0.0702635               
##           methodbasic   0.00000013075 0.0003616 1.00          
##           methodnormal  0.00000005142 0.0002268 0.98 0.99     
##           methodpercent 0.00000011709 0.0003422 1.00 1.00 0.99
##  Residual               3.69796864308 1.9230103               
## Number of obs: 560, groups:  domain, 35
## 
## Fixed effects:
##                                Estimate  Std. Error t value
## (Intercept)                  0.25527306  0.01453495  17.563
## trial_typett2               -0.02699398  0.01151785  -2.344
## trial_typett3                0.00388519  0.01160085   0.335
## trial_typett4               -0.00885239  0.01189407  -0.744
## methodbasic                 -0.00039522  0.01171982  -0.034
## methodnormal                -0.00002344  0.01169346  -0.002
## methodpercent               -0.00039411  0.01171981  -0.034
## trial_typett2:methodbasic   -0.00009045  0.01625505  -0.006
## trial_typett3:methodbasic   -0.00056295  0.01637214  -0.034
## trial_typett4:methodbasic   -0.00012008  0.01676964  -0.007
## trial_typett2:methodnormal  -0.00054747  0.01623850  -0.034
## trial_typett3:methodnormal  -0.00123340  0.01636735  -0.075
## trial_typett4:methodnormal  -0.00051009  0.01673644  -0.030
## trial_typett2:methodpercent -0.00009243  0.01625505  -0.006
## trial_typett3:methodpercent -0.00056419  0.01637213  -0.034
## trial_typett4:methodpercent -0.00012157  0.01676964  -0.007
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
# plot re
plot_model(fit_disciminability_method, 
           colors = "bw",
           type = "re") +
  mdthemes::md_theme_linedraw() +
  ggtitle("")

# extract re Tau
results_re_tau_disciminability_method <- fit_disciminability_method %>%
  merTools::REsdExtract() %>%
  #as_tibble(rownames = "trial_type") %>%
  as_tibble(rownames = "method") %>%
  rename(tau = value) %>%
  # mutate(trial_type = case_when(trial_type == "domain.(Intercept)" ~ "Trial type 1",
  #                               trial_type == "domain.trial_typeTrial type 2" ~ "Trial type 2",   
  #                               trial_type == "domain.trial_typeTrial type 3" ~ "Trial type 3",   
  #                               trial_type == "domain.trial_typeTrial type 4" ~ "Trial type 4"))
  mutate(method = case_when(method == "domain.(Intercept)" ~ "bca",
                            method == "domain.methodbasic" ~ "basic",
                            method == "domain.methodnormal" ~ "normal",
                            method == "domain.methodpercent" ~ "percent"))

# extract marginal means
results_emm_disciminability_method <- 
  summary(emmeans(fit_disciminability_method, ~ method | trial_type)) %>%
  dplyr::select(method, trial_type, estimate = emmean, se = SE, ci_lower = lower.CL, ci_upper = upper.CL)
  #mutate(trial_type = fct_relevel(trial_type, "Trial type 1", "Trial type 2", "Trial type 3", "Trial type 4"))

# combine
results_disciminability_method <- results_emm_disciminability_method %>%
  #full_join(results_re_tau_disciminability_method, by = "trial_type") %>%
  full_join(results_re_tau_disciminability_method, by = "method") %>%
  mutate(cr_lower = estimate - (1.96 * sqrt(se^2 + tau^2)),  # as in metafor package's implementation of credibility intervals, see metafor::predict.rma.R 
         cr_upper = estimate + (1.96 * sqrt(se^2 + tau^2))) %>%
  mutate(method = fct_relevel(method, "bca", "basic", "normal", "percent"))

# save to disk
write_rds(results_disciminability_method, "models/results_disciminability_method.rds")

# tests
emmeans(fit_disciminability_method, list(pairwise ~ method | trial_type), adjust = "bonferroni")
## $`emmeans of method | trial_type`
## trial_type = tt1:
##  method  emmean     SE     df lower.CL upper.CL
##  bca      0.255 0.0146 149861    0.219    0.292
##  basic    0.255 0.0146 149253    0.218    0.291
##  normal   0.255 0.0146 148896    0.219    0.292
##  percent  0.255 0.0146 149307    0.218    0.291
## 
## trial_type = tt2:
##  method  emmean     SE     df lower.CL upper.CL
##  bca      0.228 0.0144 144363    0.192    0.264
##  basic    0.228 0.0144 142730    0.192    0.264
##  normal   0.228 0.0144 143139    0.192    0.264
##  percent  0.228 0.0144 142778    0.192    0.264
## 
## trial_type = tt3:
##  method  emmean     SE     df lower.CL upper.CL
##  bca      0.259 0.0145 146641    0.223    0.295
##  basic    0.258 0.0145 145265    0.222    0.294
##  normal   0.258 0.0145 146111    0.222    0.294
##  percent  0.258 0.0145 145316    0.222    0.294
## 
## trial_type = tt4:
##  method  emmean     SE     df lower.CL upper.CL
##  bca      0.246 0.0147 154120    0.210    0.283
##  basic    0.246 0.0147 151998    0.209    0.283
##  normal   0.246 0.0147 151786    0.209    0.283
##  percent  0.246 0.0147 152054    0.209    0.283
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 4 estimates 
## 
## $`pairwise differences of method | trial_type`
## trial_type = tt1:
##  contrast             estimate     SE        df t.ratio p.value
##  bca - basic       0.000395222 0.0117 754088397  0.034  1.0000 
##  bca - normal      0.000023437 0.0117 763003410  0.002  1.0000 
##  bca - percent     0.000394112 0.0117 754435997  0.034  1.0000 
##  basic - normal   -0.000371785 0.0117 760211764 -0.032  1.0000 
##  basic - percent  -0.000001111 0.0117 754028552  0.000  1.0000 
##  normal - percent  0.000370675 0.0117 760350422  0.032  1.0000 
## 
## trial_type = tt2:
##  contrast             estimate     SE        df t.ratio p.value
##  bca - basic       0.000485673 0.0113 883780393  0.043  1.0000 
##  bca - normal      0.000570911 0.0113 885033942  0.051  1.0000 
##  bca - percent     0.000486546 0.0113 884214417  0.043  1.0000 
##  basic - normal    0.000085238 0.0112 896183865  0.008  1.0000 
##  basic - percent   0.000000873 0.0112 897925167  0.000  1.0000 
##  normal - percent -0.000084365 0.0112 896353671 -0.008  1.0000 
## 
## trial_type = tt3:
##  contrast             estimate     SE        df t.ratio p.value
##  bca - basic       0.000958171 0.0114 832909133  0.084  1.0000 
##  bca - normal      0.001256840 0.0115 829123868  0.110  1.0000 
##  bca - percent     0.000958299 0.0114 833299295  0.084  1.0000 
##  basic - normal    0.000298670 0.0114 836214458  0.026  1.0000 
##  basic - percent   0.000000128 0.0114 842635860  0.000  1.0000 
##  normal - percent -0.000298542 0.0114 836348733 -0.026  1.0000 
## 
## trial_type = tt4:
##  contrast             estimate     SE        df t.ratio p.value
##  bca - basic       0.000515307 0.0120 688164428  0.043  1.0000 
##  bca - normal      0.000533523 0.0120 694332865  0.045  1.0000 
##  bca - percent     0.000515679 0.0120 688416784  0.043  1.0000 
##  basic - normal    0.000018216 0.0119 704845742  0.002  1.0000 
##  basic - percent   0.000000372 0.0120 700406298  0.000  1.0000 
##  normal - percent -0.000017844 0.0119 704958537 -0.001  1.0000 
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: bonferroni method for 6 tests
# plot
p_prop_discriminable_method <- 
  ggplot(results_disciminability_method, aes(trial_type, estimate, color = method, shape = method, group = method)) +
  geom_linerange(aes(ymin = cr_lower, ymax = cr_upper), position = position_dodge(width = dodge_width), linetype = "dotted") +
  geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), size = 1, position = position_dodge(width = dodge_width)) +
  # geom_line(position = position_dodge(width = dodge_width)) +
  geom_point(position = position_dodge(width = dodge_width), size = 2.5) +
  scale_y_continuous(breaks = c(0, .25, .5, .75, 1), labels = c("0.00<br/>(Worse)", "0.25", "0.50", "0.75", "1.00<br/>(Better)")) +
  scale_shape_discrete(labels = c("BCA", "Basic", "Normal", "Percentile")) +
  scale_color_viridis_d(alpha = 1, begin = 0.3, end = 0.7, labels = c("BCA", "Basic", "Normal", "Percentile")) +
  mdthemes::md_theme_linedraw() +
  scale_x_discrete(labels = c("tt1" = "Trial type 1", "tt2" = "Trial type 2", "tt3" = "Trial type 3", "tt4" = "Trial type 4")) +
  labs(x = "",
       y = "Proportion of participants who whose scores<br/>differ from one another in pairwise comparisons",
       color = "Bootstrapping method",
       shape = "Bootstrapping method") + 
  theme(legend.position = "top") +
  coord_flip(ylim = c(0, 1))

p_prop_discriminable_method

CI widths as a proportion of observed range

Calculate scores

## calculate observed ranges 
observed_range_estimates_D <- data_estimates_D %>%
  group_by(domain, method, trial_type) %>%
  dplyr::summarize(min = min(estimate, na.rm = TRUE),
                   max = max(estimate, na.rm = TRUE),
                   .groups = "drop") %>%
  mutate(range = max - min) 

observed_range_estimates_PI <- data_estimates_PI %>%
  group_by(domain, method, trial_type) %>%
  dplyr::summarize(min = min(estimate, na.rm = TRUE),
                   max = max(estimate, na.rm = TRUE),
                   .groups = "drop") %>%
  mutate(range = max - min) 


# calculate CI / range 
data_ci_width_proportions_D <- data_estimates_D %>%
  # join this data into the original data
  full_join(observed_range_estimates_D, by = c("domain", "method", "trial_type")) %>%
  # calculate ci width as a proportion of observed range
  mutate(ci_width_proportion = ci_width / range) %>%
  mutate(DV_type = "*D* scores") 

data_ci_width_proportions_PI <- data_estimates_PI %>%
  # join this data into the original data
  full_join(observed_range_estimates_PI, by = c("domain", "method", "trial_type")) %>%
  # calculate ci width as a proportion of observed range
  mutate(ci_width_proportion = ci_width / range) %>%
  mutate(DV_type = "PI scores")

# combine
data_ci_width_proportions_combined <- 
  bind_rows(data_ci_width_proportions_D,
            data_ci_width_proportions_PI) %>%
  mutate(domain = as.factor(domain),
         method = fct_relevel(method, "bca", "basic", "normal", "percent"),
         trial_type = fct_relevel(trial_type, "tt1", "tt2", "tt3", "tt4"))

Model

Compare scoring method

# fit model
fit_ci_width_proportions_scoring <- 
  data_ci_width_proportions_combined %>%
  filter(method == "bca" & DV_type %in% c("*D* scores", "PI scores")) %>%
  mutate(trial_type = case_when(trial_type == "tt1" ~ "Trial type 1",
                                trial_type == "tt2" ~ "Trial type 2",   
                                trial_type == "tt3" ~ "Trial type 3",   
                                trial_type == "tt4" ~ "Trial type 4")) %>%
  lmer(ci_width_proportion ~ trial_type * DV_type + (trial_type | domain) + (1 | unique_id), 
       data = .)

# results table
tab_model(fit_ci_width_proportions_scoring,
          string.p = "p (corrected)",
          ci.hyphen = ", ",
          #emph.p = FALSE,
          p.adjust = "holm")
  ci width proportion
Predictors Estimates CI p (corrected)
(Intercept) 0.83 0.76, 0.90 <0.001
trial_type [Trial type 2] 0.05 -0.01, 0.12 0.416
trial_type [Trial type 3] 0.00 -0.09, 0.09 1.000
trial_type [Trial type 4] -0.03 -0.10, 0.05 1.000
DV_type PI scores -0.02 -0.03, -0.02 <0.001
trial_type [Trial type 2]
* DV_type PI scores
-0.00 -0.01, 0.00 1.000
trial_type [Trial type 3]
* DV_type PI scores
-0.02 -0.03, -0.01 0.002
trial_type [Trial type 4]
* DV_type PI scores
-0.00 -0.01, 0.01 1.000
Random Effects
σ2 0.01
τ00 unique_id 0.00
τ00 domain 0.05
τ11 domain.trial_typeTrial type 2 0.03
τ11 domain.trial_typeTrial type 3 0.07
τ11 domain.trial_typeTrial type 4 0.05
ρ01 domain.trial_typeTrial type 2 -0.49
ρ01 domain.trial_typeTrial type 3 -0.47
ρ01 domain.trial_typeTrial type 4 -0.71
ICC 0.87
N domain 35
N unique_id 1464
Observations 11712
Marginal R2 / Conditional R2 0.018 / 0.870
# # plot RE effects
# plot_model(fit_ci_width_proportions_scoring, 
#            colors = "bw",
#            type = "re")[[1]] +
#   mdthemes::md_theme_linedraw() +
#   ggtitle("")

# extract re Tau
results_re_tau_ci_width_proportions_scoring <- 
  merTools::REsdExtract(fit_ci_width_proportions_scoring) %>%
  as_tibble(rownames = "trial_type") %>%
  rename(tau = value) %>%
  mutate(trial_type = case_when(trial_type == "domain.(Intercept)" ~ "Trial type 1",
                                trial_type == "domain.trial_typeTrial type 2" ~ "Trial type 2", 
                                trial_type == "domain.trial_typeTrial type 3" ~ "Trial type 3", 
                                trial_type == "domain.trial_typeTrial type 4" ~ "Trial type 4")) %>%
  drop_na()

# extract marginal means
results_emm_ci_width_proportions_scoring <-
  summary(emmeans(fit_ci_width_proportions_scoring, ~ DV_type | trial_type)) %>%
  dplyr::select(DV_type, trial_type, estimate = emmean, se = SE, ci_lower = asymp.LCL, ci_upper = asymp.UCL) %>%
  mutate(DV_type = fct_relevel(DV_type, "*D* scores", "PI scores"),
         trial_type = fct_relevel(trial_type, "Trial type 1", "Trial type 2", "Trial type 3", "Trial type 4"))

# combine
results_ci_width_proportions_scoring <- results_emm_ci_width_proportions_scoring %>%
  full_join(results_re_tau_ci_width_proportions_scoring, by = "trial_type") %>%
  mutate(cr_lower = estimate - (1.96 * sqrt(se^2 + tau^2)),  # as in metafor package's implementation of credibility intervals, see metafor::predict.rma.R 
         cr_upper = estimate + (1.96 * sqrt(se^2 + tau^2)))

# save to disk
write_rds(results_ci_width_proportions_scoring, "models/results_ci_width_proportions_scoring.rds")

# tests
emmeans(fit_ci_width_proportions_scoring, list(pairwise ~ DV_type | trial_type), adjust = "bonferroni")
## $`emmeans of DV_type | trial_type`
## trial_type = Trial type 1:
##  DV_type    emmean     SE  df asymp.LCL asymp.UCL
##  *D* scores  0.830 0.0382 Inf     0.745     0.916
##  PI scores   0.808 0.0382 Inf     0.723     0.894
## 
## trial_type = Trial type 2:
##  DV_type    emmean     SE  df asymp.LCL asymp.UCL
##  *D* scores  0.885 0.0357 Inf     0.805     0.965
##  PI scores   0.858 0.0357 Inf     0.778     0.938
## 
## trial_type = Trial type 3:
##  DV_type    emmean     SE  df asymp.LCL asymp.UCL
##  *D* scores  0.831 0.0440 Inf     0.732     0.930
##  PI scores   0.793 0.0440 Inf     0.694     0.891
## 
## trial_type = Trial type 4:
##  DV_type    emmean     SE  df asymp.LCL asymp.UCL
##  *D* scores  0.804 0.0287 Inf     0.740     0.869
##  PI scores   0.782 0.0287 Inf     0.718     0.846
## 
## Degrees-of-freedom method: asymptotic 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 2 estimates 
## 
## $`pairwise differences of DV_type | trial_type`
## trial_type = Trial type 1:
##  contrast               estimate      SE  df z.ratio p.value
##  *D* scores - PI scores   0.0220 0.00323 Inf  6.805  <.0001 
## 
## trial_type = Trial type 2:
##  contrast               estimate      SE  df z.ratio p.value
##  *D* scores - PI scores   0.0267 0.00323 Inf  8.252  <.0001 
## 
## trial_type = Trial type 3:
##  contrast               estimate      SE  df z.ratio p.value
##  *D* scores - PI scores   0.0384 0.00323 Inf 11.883  <.0001 
## 
## trial_type = Trial type 4:
##  contrast               estimate      SE  df z.ratio p.value
##  *D* scores - PI scores   0.0221 0.00323 Inf  6.852  <.0001 
## 
## Degrees-of-freedom method: asymptotic
emmeans(fit_ci_width_proportions_scoring, list(pairwise ~ trial_type | DV_type), adjust = "bonferroni")
## $`emmeans of trial_type | DV_type`
## DV_type = *D* scores:
##  trial_type   emmean     SE  df asymp.LCL asymp.UCL
##  Trial type 1  0.830 0.0382 Inf     0.735     0.925
##  Trial type 2  0.885 0.0357 Inf     0.796     0.974
##  Trial type 3  0.831 0.0440 Inf     0.721     0.941
##  Trial type 4  0.804 0.0287 Inf     0.733     0.876
## 
## DV_type = PI scores:
##  trial_type   emmean     SE  df asymp.LCL asymp.UCL
##  Trial type 1  0.808 0.0382 Inf     0.713     0.903
##  Trial type 2  0.858 0.0357 Inf     0.769     0.947
##  Trial type 3  0.793 0.0440 Inf     0.683     0.902
##  Trial type 4  0.782 0.0287 Inf     0.710     0.854
## 
## Degrees-of-freedom method: asymptotic 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 4 estimates 
## 
## $`pairwise differences of trial_type | DV_type`
## DV_type = *D* scores:
##  contrast                     estimate     SE  df z.ratio p.value
##  Trial type 1 - Trial type 2 -0.054947 0.0317 Inf -1.733  0.4991 
##  Trial type 1 - Trial type 3 -0.000945 0.0460 Inf -0.021  1.0000 
##  Trial type 1 - Trial type 4  0.025783 0.0374 Inf  0.690  1.0000 
##  Trial type 2 - Trial type 3  0.054002 0.0370 Inf  1.459  0.8676 
##  Trial type 2 - Trial type 4  0.080730 0.0306 Inf  2.642  0.0495 
##  Trial type 3 - Trial type 4  0.026728 0.0366 Inf  0.730  1.0000 
## 
## DV_type = PI scores:
##  contrast                     estimate     SE  df z.ratio p.value
##  Trial type 1 - Trial type 2 -0.050271 0.0317 Inf -1.585  0.6777 
##  Trial type 1 - Trial type 3  0.015466 0.0460 Inf  0.336  1.0000 
##  Trial type 1 - Trial type 4  0.025937 0.0374 Inf  0.694  1.0000 
##  Trial type 2 - Trial type 3  0.065738 0.0370 Inf  1.776  0.4545 
##  Trial type 2 - Trial type 4  0.076208 0.0306 Inf  2.494  0.0759 
##  Trial type 3 - Trial type 4  0.010470 0.0366 Inf  0.286  1.0000 
## 
## Degrees-of-freedom method: asymptotic 
## P value adjustment: bonferroni method for 6 tests
# plot
dodge_width <- 0.8

p_ci_width_proportion_observed_range_scoring <- 
  ggplot(results_ci_width_proportions_scoring, aes(fct_rev(trial_type), estimate, color = DV_type, shape = DV_type, group = DV_type)) +
  geom_linerange(aes(ymin = cr_lower, ymax = cr_upper), position = position_dodge(width = dodge_width), linetype = "dotted") +
  geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), size = 1, position = position_dodge(width = dodge_width)) +
  # geom_line(position = position_dodge(width = dodge_width)) +
  geom_point(position = position_dodge(width = dodge_width), size = 2.5) +
  scale_shape_discrete(labels = c("*D* scores", "PI scores")) +
  scale_y_continuous(breaks = c(0, .25, .5, .75, 1), labels = c("0.00<br/>(Better)", "0.25", "0.50", "0.75", "1.00<br/>(Worse)")) +
  scale_color_viridis_d(begin = 0.3, end = 0.7, labels = c("*D* scores", "PI scores")) +
  mdthemes::md_theme_linedraw() +
  scale_x_discrete(labels = c("tt1" = "Trial type 1", "tt2" = "Trial type 2", "tt3" = "Trial type 3", "tt4" = "Trial type 4")) +
  labs(x = "",
       y = "Within-subject 95% CI widths /<br/>between-subjects observed range<br/>",
       color = "Scoring method",
       shape = "Scoring method") +
  theme(legend.position = "top") +
  coord_flip(ylim = c(0, 1))

p_ci_width_proportion_observed_range_scoring

Compare CI bootstrapping methods

data_ci_width_proportions_combined_method <- 
  data_ci_width_proportions_D %>%
  mutate(domain = as.factor(domain),
         method = as.factor(method),
         method = fct_relevel(method, "bca", "basic", "normal", "percent"),
         trial_type = fct_relevel(trial_type, "tt1", "tt2", "tt3", "tt4")) %>%
  select(unique_id, domain, trial_type, method, ci_width_proportion) 

# fit model
if(file.exists("models/fit_ci_width_proportions_method.rds")) {
  
  fit_ci_width_proportions_method <- read_rds("models/fit_ci_width_proportions_method.rds")
  
} else {
  
  fit_ci_width_proportions_method <- 
    lmer(ci_width_proportion ~ trial_type * method + (trial_type | domain) + (1 | unique_id), 
         data = data_ci_width_proportions_combined_method)
  
  write_rds(fit_ci_width_proportions_method, "models/fit_ci_width_proportions_method.rds")
  
}

# results table
tab_model(fit_ci_width_proportions_method, 
          string.p = "p (corrected)", 
          ci.hyphen = ", ",
          #emph.p = FALSE,
          p.adjust = "holm")
  ci width proportion
Predictors Estimates CI p (corrected)
(Intercept) 0.83 0.75, 0.90 <0.001
trial_type [tt2] 0.06 -0.01, 0.12 1.000
trial_type [tt3] 0.00 -0.10, 0.11 1.000
trial_type [tt4] -0.02 -0.10, 0.06 1.000
method [basic] -0.01 -0.02, -0.01 0.001
method [normal] -0.01 -0.02, -0.00 0.051
method [percent] -0.01 -0.02, -0.01 0.001
trial_type [tt2] * method
[basic]
0.01 -0.00, 0.02 1.000
trial_type [tt3] * method
[basic]
0.01 -0.00, 0.01 1.000
trial_type [tt4] * method
[basic]
0.01 -0.00, 0.02 1.000
trial_type [tt2] * method
[normal]
0.01 -0.00, 0.01 1.000
trial_type [tt3] * method
[normal]
0.00 -0.00, 0.01 1.000
trial_type [tt4] * method
[normal]
0.01 -0.00, 0.02 1.000
trial_type [tt2] * method
[percent]
0.01 -0.00, 0.02 1.000
trial_type [tt3] * method
[percent]
0.01 -0.00, 0.01 1.000
trial_type [tt4] * method
[percent]
0.01 -0.00, 0.02 1.000
Random Effects
σ2 0.01
τ00 unique_id 0.00
τ00 domain 0.05
τ11 domain.trial_typett2 0.04
τ11 domain.trial_typett3 0.09
τ11 domain.trial_typett4 0.06
ρ01 domain.trial_typett2 -0.52
ρ01 domain.trial_typett3 -0.54
ρ01 domain.trial_typett4 -0.68
ICC 0.87
N domain 35
N unique_id 1464
Observations 23424
Marginal R2 / Conditional R2 0.013 / 0.876
# summary
summary(fit_ci_width_proportions_method)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ci_width_proportion ~ trial_type * method + (trial_type | domain) +  
##     (1 | unique_id)
##    Data: data_ci_width_proportions_combined_method
## 
## REML criterion at convergence: -42482.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.7931 -0.4792  0.0852  0.5727  5.7306 
## 
## Random effects:
##  Groups    Name          Variance Std.Dev. Corr             
##  unique_id (Intercept)   0.004573 0.06762                   
##  domain    (Intercept)   0.051563 0.22707                   
##            trial_typett2 0.042396 0.20590  -0.52            
##            trial_typett3 0.093386 0.30559  -0.54  0.59      
##            trial_typett4 0.058856 0.24260  -0.68  0.53  0.72
##  Residual                0.007930 0.08905                   
## Number of obs: 23424, groups:  unique_id, 1464; domain, 35
## 
## Fixed effects:
##                                 Estimate   Std. Error           df t value
## (Intercept)                     0.827669     0.038526    33.977736  21.483
## trial_typett2                   0.056406     0.034981    34.341297   1.612
## trial_typett3                   0.004559     0.051774    34.100392   0.088
## trial_typett4                  -0.020482     0.041158    34.227455  -0.498
## methodbasic                    -0.013544     0.003291 21842.663130  -4.115
## methodnormal                   -0.009487     0.003291 21842.663134  -2.882
## methodpercent                  -0.013544     0.003291 21842.663133  -4.115
## trial_typett2:methodbasic       0.006058     0.004655 21842.663131   1.301
## trial_typett3:methodbasic       0.005098     0.004655 21842.663123   1.095
## trial_typett4:methodbasic       0.006675     0.004655 21842.663124   1.434
## trial_typett2:methodnormal      0.005841     0.004655 21842.663133   1.255
## trial_typett3:methodnormal      0.004693     0.004655 21842.663125   1.008
## trial_typett4:methodnormal      0.006251     0.004655 21842.663128   1.343
## trial_typett2:methodpercent     0.006058     0.004655 21842.663132   1.301
## trial_typett3:methodpercent     0.005098     0.004655 21842.663126   1.095
## trial_typett4:methodpercent     0.006675     0.004655 21842.663127   1.434
##                                         Pr(>|t|)    
## (Intercept)                 < 0.0000000000000002 ***
## trial_typett2                            0.11601    
## trial_typett3                            0.93035    
## trial_typett4                            0.62192    
## methodbasic                            0.0000389 ***
## methodnormal                             0.00395 ** 
## methodpercent                          0.0000389 ***
## trial_typett2:methodbasic                0.19312    
## trial_typett3:methodbasic                0.27345    
## trial_typett4:methodbasic                0.15158    
## trial_typett2:methodnormal               0.20954    
## trial_typett3:methodnormal               0.31336    
## trial_typett4:methodnormal               0.17934    
## trial_typett2:methodpercent              0.19312    
## trial_typett3:methodpercent              0.27345    
## trial_typett4:methodpercent              0.15158    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## optimizer (nloptwrap) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.00321383 (tol = 0.002, component 1)
# # plot re
# plot_model(fit_ci_width_proportions_method, 
#            colors = "bw",
#            type = "re") +
#   mdthemes::md_theme_linedraw() +
#   ggtitle("")

# extract re Tau
results_re_tau_ci_width_proportions_method <- fit_ci_width_proportions_method %>%
  merTools::REsdExtract() %>%
  as_tibble(rownames = "trial_type") %>%
  rename(tau = value) %>%
  mutate(trial_type = case_when(trial_type == "domain.(Intercept)" ~ "Trial type 1",
                                trial_type == "domain.trial_typett2" ~ "Trial type 2",  
                                trial_type == "domain.trial_typett3" ~ "Trial type 3",  
                                trial_type == "domain.trial_typett4" ~ "Trial type 4"))

# extract marginal means
results_emm_ci_width_proportions_method <- 
  summary(emmeans(fit_ci_width_proportions_method, ~ method | trial_type)) %>%
  dplyr::select(method, trial_type, estimate = emmean, se = SE, ci_lower = asymp.LCL, ci_upper = asymp.UCL) %>%
  mutate(trial_type = case_when(trial_type == "tt1" ~ "Trial type 1",
                                trial_type == "tt2" ~ "Trial type 2",
                                trial_type == "tt3" ~ "Trial type 3",
                                trial_type == "tt4" ~ "Trial type 4"),
         trial_type = fct_relevel(trial_type, "Trial type 1", "Trial type 2", "Trial type 3", "Trial type 4"))

# combine
results_ci_width_proportions_method <- results_emm_ci_width_proportions_method %>%
  full_join(results_re_tau_ci_width_proportions_method, by = "trial_type") %>%
  mutate(cr_lower = estimate - (1.96 * sqrt(se^2 + tau^2)),  # as in metafor package's implementation of credibility intervals, see metafor::predict.rma.R 
         cr_upper = estimate + (1.96 * sqrt(se^2 + tau^2))) %>%
  drop_na()

# save to disk
write_rds(results_ci_width_proportions_method, "models/results_ci_width_proportions_method.rds")

# tests
emmeans(fit_ci_width_proportions_method, list(pairwise ~ method | trial_type), adjust = "bonferroni")
## $`emmeans of method | trial_type`
## trial_type = tt1:
##  method  emmean     SE  df asymp.LCL asymp.UCL
##  bca      0.828 0.0385 Inf     0.731     0.924
##  basic    0.814 0.0385 Inf     0.718     0.910
##  normal   0.818 0.0385 Inf     0.722     0.914
##  percent  0.814 0.0385 Inf     0.718     0.910
## 
## trial_type = tt2:
##  method  emmean     SE  df asymp.LCL asymp.UCL
##  bca      0.884 0.0362 Inf     0.794     0.974
##  basic    0.877 0.0362 Inf     0.786     0.967
##  normal   0.880 0.0362 Inf     0.790     0.971
##  percent  0.877 0.0362 Inf     0.786     0.967
## 
## trial_type = tt3:
##  method  emmean     SE  df asymp.LCL asymp.UCL
##  bca      0.832 0.0449 Inf     0.720     0.944
##  basic    0.824 0.0449 Inf     0.712     0.936
##  normal   0.827 0.0449 Inf     0.715     0.940
##  percent  0.824 0.0449 Inf     0.712     0.936
## 
## trial_type = tt4:
##  method  emmean     SE  df asymp.LCL asymp.UCL
##  bca      0.807 0.0320 Inf     0.727     0.887
##  basic    0.800 0.0320 Inf     0.720     0.880
##  normal   0.804 0.0320 Inf     0.724     0.884
##  percent  0.800 0.0320 Inf     0.720     0.880
## 
## Degrees-of-freedom method: asymptotic 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 4 estimates 
## 
## $`pairwise differences of method | trial_type`
## trial_type = tt1:
##  contrast         estimate      SE  df z.ratio p.value
##  bca - basic       0.01354 0.00329 Inf  4.115  0.0002 
##  bca - normal      0.00949 0.00329 Inf  2.882  0.0237 
##  bca - percent     0.01354 0.00329 Inf  4.115  0.0002 
##  basic - normal   -0.00406 0.00329 Inf -1.233  1.0000 
##  basic - percent   0.00000 0.00329 Inf  0.000  1.0000 
##  normal - percent  0.00406 0.00329 Inf  1.233  1.0000 
## 
## trial_type = tt2:
##  contrast         estimate      SE  df z.ratio p.value
##  bca - basic       0.00749 0.00329 Inf  2.274  0.1376 
##  bca - normal      0.00365 0.00329 Inf  1.108  1.0000 
##  bca - percent     0.00749 0.00329 Inf  2.274  0.1376 
##  basic - normal   -0.00384 0.00329 Inf -1.167  1.0000 
##  basic - percent   0.00000 0.00329 Inf  0.000  1.0000 
##  normal - percent  0.00384 0.00329 Inf  1.167  1.0000 
## 
## trial_type = tt3:
##  contrast         estimate      SE  df z.ratio p.value
##  bca - basic       0.00845 0.00329 Inf  2.566  0.0617 
##  bca - normal      0.00479 0.00329 Inf  1.456  0.8717 
##  bca - percent     0.00845 0.00329 Inf  2.566  0.0617 
##  basic - normal   -0.00365 0.00329 Inf -1.110  1.0000 
##  basic - percent   0.00000 0.00329 Inf  0.000  1.0000 
##  normal - percent  0.00365 0.00329 Inf  1.110  1.0000 
## 
## trial_type = tt4:
##  contrast         estimate      SE  df z.ratio p.value
##  bca - basic       0.00687 0.00329 Inf  2.087  0.2214 
##  bca - normal      0.00324 0.00329 Inf  0.983  1.0000 
##  bca - percent     0.00687 0.00329 Inf  2.087  0.2214 
##  basic - normal   -0.00363 0.00329 Inf -1.104  1.0000 
##  basic - percent   0.00000 0.00329 Inf  0.000  1.0000 
##  normal - percent  0.00363 0.00329 Inf  1.104  1.0000 
## 
## Degrees-of-freedom method: asymptotic 
## P value adjustment: bonferroni method for 6 tests
# plot
p_ci_width_proportion_observed_range_method <- 
  ggplot(results_ci_width_proportions_method, aes(trial_type, estimate, 
                                                  color = method, 
                                                  shape = method, 
                                                  group = method)) +
  geom_linerange(aes(ymin = cr_lower, ymax = cr_upper), position = position_dodge(width = dodge_width), linetype = "dotted") +
  geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), size = 1, position = position_dodge(width = dodge_width)) +
  # geom_line(position = position_dodge(width = dodge_width)) +
  geom_point(position = position_dodge(width = dodge_width), size = 2.5) +
  mdthemes::md_theme_linedraw() +
  scale_y_continuous(breaks = c(0, .25, .5, .75, 1), labels = c("0.00<br/>(Better)", "0.25", "0.50", "0.75", "1.00<br/>(Worse)")) +
  scale_shape_discrete(labels = c("BCA", "Basic", "Normal", "Percentile")) +
  scale_color_viridis_d(alpha = 1, begin = 0.3, end = 0.7, labels = c("BCA", "Basic", "Normal", "Percentile")) +
  labs(shape = "Bootstrapping method",
       color = "Bootstrapping method",
       x = "",
       y = "95% CI width / observed range") +
  theme(legend.position = "top") +
  coord_flip(ylim = c(0, 1))

p_ci_width_proportion_observed_range_method

Combined plots

Plot 1

Plot 1 is merely illusatrative. It shows the bootstrapped CIs for all participants, split by domain, but not splitting by trial type. D scores and the BCA method are displayed as they are used for the subsequent analyses.

p_cis_by_domain

ggsave(filename  = "plots/figure_1_cis_by_domain.pdf",
       plot      = p_cis_by_domain,
       device    = "pdf",
       # path      = NULL,
       # dpi       = 300,
       units     = "in",
       width     = 8,
       height    = 8,
       limitsize = TRUE)

Plot 2

Most probable CI width for D scores when bootstrapped using four different methods. Very similar results are found across methods. Overall Maximum A-Posteroi width (MAP, i.e., the mode of a continuous variable) was D = 1.31 +/- 0.01 between bootstrapping methods. Some domains and trial types did demonstrate smaller most probable widths.

NB I elected not to meta-analyze these widths as they demonstrate very large skew at the individual level, which violate the assumptions of linear meta-analysis and underestimate the typical width (ie estimated mean widths << MAP observed widths). Rather than meta analyze, I simply report the domain and trial type level MAP values. More informative and valid analyses are presented below - ones which can directly compare the D and PI as an alternative. That could not be accomplished with a direct comparison with D/PI scores’ 95% CIs as they are on different scales and follow different distributions.

Given the minimal differences between bootstrapping methods, I employ the Bias Corrected and Accelerated methods (BCA) for all subsequent primary analyses. Sensitivity analyses that compare between bootstrapping methods are computed but not included in the main manuscript.

p_ci_widths

ggsave(filename  = "plots/figure_2_ci_widths.pdf",
       plot      = p_ci_widths,
       device    = "pdf",
       # path      = NULL,
       # dpi       = 300,
       units     = "in",
       width     = 8,
       height    = 6,
       limitsize = TRUE)

Plot 3

The results of three hierarchical/meta analytic models are presented below, all of which provide information via different methods regarding how informative an individual participants’ D (or PI) score is in terms of being able to state that they demonstrated evidence of a bias/IRAP effect/implicit attitude, whether that individual can be discriminated from other individuals in the same domain using the same trial type, and how much of the range of observed scores an individuals score’s CI spans.

  1. a meta-analysis of the proportion of participants that show non-zero scores (i.e., whose D or PI scores’ 95% CIs exclude zero), (2) the proportion of scores that are discriminable from one another. Pairwise comparisons are made between all participants’ scores within a domain and trial type, and the proportion that lie outside one another’s CIs can be inferred to be different from one another (i.e., are discriminable as different from one another). This analysis is useful because it avoids the issue or debate around how meaningful a score’s zero point is (i.e., D = 0, PI = 0.50) or whether it is a meaningful reference point. That is, previous research has argued that D = 0 cannot be inferred to represent no bias. Discriminability is agnostic to any individual comparison point and avoids this issue. (3) A meta analysis of the ratio between each participants’ score and the maximium observed range of scores for that domain and trial type. I.e., given the observed range of scores across participants, what proportion of that range did each individual participant’s score’s Confidence Interval span. If each individual’s CIs are compatible with a large proportion of the total range of all observed socres, then each score is so poorly estimated as to tell us very little about where on the spectrum each participant lies.

All meta analyses compare D and PI scores to assess whether the results are dependent on the D algorithm which has been argued to be suboptimal. That is, I assess whether this issue can be trivially resolved by scoring the data a different way.

Note that the theoretical max possible range of D scores is -2 to +2, but such extreme scores are practically impossible. As such, in order to understand the CI width in terms of realistic data - and also in order to compare D and PI which are on different scales and distributions - I standardize CI widths by the observed range of scores for each domain and trial type.

p_combined <- 
  p_prop_nonzero_scoring + 
  p_prop_discriminable_scoring + 
  p_ci_width_proportion_observed_range_scoring +
  plot_layout(ncol = 1, guides = "collect") & theme(legend.position = "bottom")

p_combined

ggsave(filename  = "plots/figure_3_metaanalyses.pdf",
       plot      = p_combined,
       device    = "pdf",
       # path      = NULL,
       # dpi       = 300,
       units     = "in",
       width     = 5,
       height    = 7,
       limitsize = TRUE)

Session info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS  10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] lmerTest_3.1-3   ggstance_0.3.4   emmeans_1.4.6    sjPlot_2.8.4    
##  [5] lme4_1.1-26      Matrix_1.2-18    mdthemes_0.1.0   patchwork_1.0.1 
##  [9] bayestestR_0.7.5 boot_1.3-27      kableExtra_1.3.1 knitr_1.30      
## [13] forcats_0.5.0    stringr_1.4.0    dplyr_1.0.3      purrr_0.3.4     
## [17] readr_1.3.1      tidyr_1.1.2      tibble_3.1.0     ggplot2_3.3.3   
## [21] tidyverse_1.3.0 
## 
## loaded via a namespace (and not attached):
##   [1] TH.data_1.0-10      minqa_1.2.4         colorspace_2.0-0   
##   [4] ellipsis_0.3.1      sjlabelled_1.1.7    estimability_1.3   
##   [7] snakecase_0.11.0    markdown_1.1        parameters_0.8.6   
##  [10] fs_1.4.1            gridtext_0.1.1      ggtext_0.1.0.9000  
##  [13] rstudioapi_0.13     glmmTMB_1.0.2.1     farver_2.1.0       
##  [16] fansi_0.4.2         mvtnorm_1.1-1       lubridate_1.7.9    
##  [19] xml2_1.3.2          codetools_0.2-16    splines_4.0.2      
##  [22] sjmisc_2.8.5        jsonlite_1.7.2      nloptr_1.2.2.2     
##  [25] ggeffects_0.14.3    pbkrtest_0.4-8.6    broom_0.7.2        
##  [28] dbplyr_1.4.3        broom.mixed_0.2.6   shiny_1.5.0        
##  [31] effectsize_0.4.0    compiler_4.0.2      httr_1.4.1         
##  [34] sjstats_0.18.0      backports_1.1.9     fastmap_1.0.1      
##  [37] assertthat_0.2.1    cli_2.3.1           later_1.0.0        
##  [40] htmltools_0.5.0     tools_4.0.2         coda_0.19-3        
##  [43] gtable_0.3.0        glue_1.4.2          reshape2_1.4.4     
##  [46] merTools_0.5.2      Rcpp_1.0.6          cellranger_1.1.0   
##  [49] vctrs_0.3.6         nlme_3.1-148        iterators_1.0.12   
##  [52] insight_0.10.0      xfun_0.19           rvest_0.3.5        
##  [55] mime_0.9            lifecycle_1.0.0     statmod_1.4.35     
##  [58] MASS_7.3-53         zoo_1.8-8           scales_1.1.1       
##  [61] promises_1.1.0      hms_0.5.3           sandwich_2.5-1     
##  [64] TMB_1.7.18          yaml_2.2.1          stringi_1.5.3      
##  [67] highr_0.8           foreach_1.5.0       plotrix_3.7-8      
##  [70] blme_1.0-5          rlang_0.4.10        pkgconfig_2.0.3    
##  [73] arm_1.11-1          evaluate_0.14       lattice_0.20-41    
##  [76] labeling_0.4.2      tidyselect_1.1.0    plyr_1.8.6         
##  [79] magrittr_2.0.1      R6_2.5.0            generics_0.1.0     
##  [82] multcomp_1.4-13     DBI_1.1.0           pillar_1.5.1       
##  [85] haven_2.3.1         withr_2.4.1         abind_1.4-5        
##  [88] survival_3.1-12     performance_0.4.6   janitor_2.0.1      
##  [91] modelr_0.1.8        crayon_1.4.1        utf8_1.2.1         
##  [94] rmarkdown_2.5       grid_4.0.2          readxl_1.3.1       
##  [97] reprex_0.3.0        digest_0.6.27       webshot_0.5.2      
## [100] xtable_1.8-4        numDeriv_2016.8-1.1 httpuv_1.5.2       
## [103] munsell_0.5.0       viridisLite_0.3.0